LLM10—Unbounded Consumption
>Control Description
Unbounded Consumption occurs when an LLM application allows users to conduct excessive and uncontrolled inferences, leading to denial of service (DoS), economic losses, model theft, and service degradation. The high computational demands of LLMs make them vulnerable to resource exploitation and unauthorized usage.
>Vulnerability Types
- 1.Variable-Length Input Flood: Overloading LLM with numerous inputs of varying lengths
- 2.Denial of Wallet (DoW): Exploiting cost-per-use model with high volume operations
- 3.Continuous Input Overflow: Sending inputs exceeding LLM's context window
- 4.Resource-Intensive Queries: Submitting demanding queries with complex patterns
- 5.Model Extraction via API: Using crafted inputs to replicate model behavior
- 6.Functional Model Replication: Using target model to generate synthetic training data
- 7.Side-Channel Attacks: Exploiting input filtering to harvest model architecture information
>Common Impacts
Service unavailability (DoS)
Unsustainable financial costs
Intellectual property theft
Model weight and architecture exposure
Service degradation for legitimate users
>Prevention & Mitigation Strategies
- 1.Implement strict input validation ensuring inputs don't exceed reasonable size limits
- 2.Limit exposure of logits and logprobs in API responses
- 3.Apply rate limiting and user quotas to restrict requests per time period
- 4.Monitor and manage resource allocation dynamically
- 5.Set timeouts and throttle processing for resource-intensive operations
- 6.Restrict LLM's access to network resources, internal services, and APIs
- 7.Implement comprehensive logging, monitoring, and anomaly detection
- 8.Implement watermarking to detect unauthorized use of LLM outputs
- 9.Design graceful degradation under heavy load
- 10.Implement restrictions on queued actions with dynamic scaling and load balancing
- 11.Train models to detect and mitigate adversarial queries and extraction attempts
- 12.Build and use glitch token filtering lists
>Attack Scenarios
#1Uncontrolled Input Size
An attacker submits an unusually large input, resulting in excessive memory usage and CPU load, potentially crashing the system.
#2High Volume Requests
An attacker transmits a high volume of requests, causing excessive resource consumption and making the service unavailable to legitimate users.
#3Denial of Wallet
An attacker generates excessive operations to exploit the pay-per-use model of cloud-based AI services, causing unsustainable costs.
#4Model Replication
An attacker uses the LLM's API to generate synthetic training data and fine-tunes another model, creating a functional equivalent.
>MITRE ATLAS Mapping
>References
Ask AI
Configure your API key to use AI features.