2.5.4 | ugeco

Okay, let's cover 2.5.d Rate Limiting.

Definition: Rate limiting is a control mechanism used to restrict the number of requests a user, IP address, API key, or service is allowed to make to a server or API within a specified time window.
Purpose: Why implement rate limiting?
- Prevent Resource Exhaustion: Protects backend services (APIs, databases, etc.) from being overwhelmed by an excessive number of requests, whether intentional (abuse) or unintentional (buggy client).
- Ensure Fair Usage / Quality of Service: Prevents a single user or client from monopolizing system resources, ensuring that the service remains available and performant for other users.
- Security / Mitigate Abuse: Helps defend against various attacks like Denial-of-Service (DoS), brute-force password attempts, credential stuffing, and excessive web scraping.
- Cost Control: Limits resource consumption, especially in cloud environments where usage might be tied to costs, or when calling third-party APIs with usage quotas.
- Enforce API Quotas: Allows businesses to offer different tiers of service with varying request limits based on subscription levels.
Where Rate Limiting is Implemented:
- API Gateway: A very common and effective place to implement rate limiting centrally, as it's the entry point for API requests.
- Load Balancer: Some advanced Layer 7 load balancers offer rate limiting capabilities.
- Application Layer: Directly within the application code or using web framework middleware. Gives fine-grained control but requires implementation in each service.
- Dedicated Rate Limiter Service: A separate microservice can handle rate limiting logic, often using a fast cache like Redis to store counts/tokens.
Common Rate Limiting Algorithms:
1. Token Bucket:
  - Concept: Imagine a bucket with a fixed capacity holding tokens. Tokens are added to the bucket at a constant rate (e.g., 10 tokens per second). Each incoming request requires one token to proceed. If a token is available, it's consumed, and the request passes. If the bucket is empty, the request is rejected (or sometimes queued). The bucket capacity allows for bursts of requests up to that size.
  - Pros: Allows bursts, smooths traffic, relatively simple.
  - Cons: Implementation details matter (e.g., how often tokens are refilled).
2. Leaky Bucket:
  - Concept: Incoming requests are added to a fixed-size queue (the bucket). Requests leak out (are processed) from the queue at a fixed, constant rate. If the queue is full when a request arrives, it's discarded.
  - Pros: Guarantees a smooth, fixed output rate, regardless of input bursts.
  - Cons: Bursts are processed at the fixed rate (delayed), which might not be desirable. A burst can fill the queue quickly, leading to rejected requests.
3. Fixed Window Counter:
  - Concept: A counter tracks the number of requests received within a fixed time window (e.g., 100 requests per minute). The counter resets at the beginning of each window. If the count exceeds the limit within the window, requests are rejected.
  - Pros: Simple to implement.
  - Cons: A burst of requests right at the edge of a window (e.g., end of minute 1 and beginning of minute 2) can allow double the rate limit temporarily.
4. Sliding Window Log:
  - Concept: Stores timestamps of incoming requests within the relevant time window (e.g., the last 60 seconds). When a new request arrives, remove timestamps older than the window duration and count the remaining timestamps. If the count is below the limit, accept the request and add its timestamp.
  - Pros: Highly accurate, avoids the fixed window edge problem.
  - Cons: Can consume significant memory to store all timestamps.
5. Sliding Window Counter:
  - Concept: A hybrid approach that offers a good balance. It approximates the sliding window using counts from the current and previous fixed windows.
  - Pros: More accurate than Fixed Window, less memory-intensive than Sliding Window Log.
  - Cons: More complex than Fixed Window.
Identifying Request Source: Limits are typically applied based on:
- API Key / User ID
- IP Address (less reliable due to NAT/proxies)
- Globally for a specific service/endpoint
Handling Exceeded Limits:
- The standard response is an HTTP 429 Too Many Requests status code.
- Often includes response headers like Retry-After (suggesting how long the client should wait) or headers indicating the current limit status (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset).
In an Interview: Rate limiting is essential for public-facing APIs and protecting backend systems.
- Explain why you would implement it (prevent abuse/overload, ensure fairness).
- Suggest where you would implement it (API Gateway is a common, sensible choice).
- Show awareness of different strategies (Token Bucket is easy to explain).
- Mention the 429 status code.