LLM10—Unbounded Consumption

>Control Description

Unbounded Consumption occurs when an LLM application allows users to conduct excessive and uncontrolled inferences, leading to denial of service (DoS), economic losses, model theft, and service degradation. The high computational demands of LLMs make them vulnerable to resource exploitation and unauthorized usage.

>Vulnerability Types

1.Variable-Length Input Flood: Overloading LLM with numerous inputs of varying lengths
2.Denial of Wallet (DoW): Exploiting cost-per-use model with high volume operations
3.Continuous Input Overflow: Sending inputs exceeding LLM's context window
4.Resource-Intensive Queries: Submitting demanding queries with complex patterns
5.Model Extraction via API: Using crafted inputs to replicate model behavior
6.Functional Model Replication: Using target model to generate synthetic training data
7.Side-Channel Attacks: Exploiting input filtering to harvest model architecture information

>Common Impacts

Service unavailability (DoS)

Unsustainable financial costs

Intellectual property theft

Model weight and architecture exposure

Service degradation for legitimate users

>Prevention & Mitigation Strategies

1.Implement strict input validation ensuring inputs don't exceed reasonable size limits
2.Limit exposure of logits and logprobs in API responses
3.Apply rate limiting and user quotas to restrict requests per time period
4.Monitor and manage resource allocation dynamically
5.Set timeouts and throttle processing for resource-intensive operations
6.Restrict LLM's access to network resources, internal services, and APIs
7.Implement comprehensive logging, monitoring, and anomaly detection
8.Implement watermarking to detect unauthorized use of LLM outputs
9.Design graceful degradation under heavy load
10.Implement restrictions on queued actions with dynamic scaling and load balancing
11.Train models to detect and mitigate adversarial queries and extraction attempts
12.Build and use glitch token filtering lists

>Attack Scenarios

#1Uncontrolled Input Size

An attacker submits an unusually large input, resulting in excessive memory usage and CPU load, potentially crashing the system.

#2High Volume Requests

An attacker transmits a high volume of requests, causing excessive resource consumption and making the service unavailable to legitimate users.

#3Denial of Wallet

An attacker generates excessive operations to exploit the pay-per-use model of cloud-based AI services, causing unsustainable costs.

#4Model Replication

An attacker uses the LLM's API to generate synthetic training data and fine-tunes another model, creating a functional equivalent.

>MITRE ATLAS Mapping

>References

Ask AI

Configure your API key to use AI features.