2.5.5 | ugeco

Okay, let's cover the final topic in this section: 2.5.e Circuit Breaker Pattern. This is a critical pattern for building resilient distributed systems, especially microservices.

The Problem: Cascading Failures
- Imagine Service A depends on Service B. If Service B becomes slow, unresponsive, or starts failing frequently, Service A might keep making requests to it.
- These requests might time out after a long delay, consuming resources (threads, connections, memory) on Service A while waiting.
- If many requests are stuck waiting for Service B, Service A itself can run out of resources and become unresponsive, potentially causing failures in services that depend on Service A. This is a cascading failure.
- Simple retries can sometimes make the problem worse by overwhelming the already struggling Service B.
Definition: Circuit Breaker Pattern
- The Circuit Breaker pattern is a software design pattern that prevents an application from repeatedly trying to execute an operation that's likely to fail.
- It acts like an electrical circuit breaker: when failures reach a certain threshold, the circuit "trips" or "opens," and further calls are prevented for a period, allowing the downstream system time to recover.
States of a Circuit Breaker: A circuit breaker typically operates in three states:
1. CLOSED:
  - This is the normal operating state. Requests from the client (e.g., Service A) are allowed to pass through to the supplier (e.g., Service B).
  - The circuit breaker monitors the calls for failures (e.g., timeouts, specific error codes).
  - If the number of failures exceeds a configured threshold within a specific time window, the circuit breaker trips and transitions to the OPEN state.
2. OPEN:
  - In this state, the circuit breaker immediately rejects (fails fast) all requests to the supplier without attempting the actual call. It might return an error or a default fallback response.
  - This prevents the client from wasting resources on calls that are likely to fail and protects the struggling supplier from further load, giving it time to recover.
  - The circuit breaker stays in the OPEN state for a configured timeout period. After this timeout expires, it transitions to the HALF-OPEN state.
3. HALF-OPEN:
  - In this state, the circuit breaker allows a limited number of "trial" requests to pass through to the supplier.
  - If these trial requests succeed: The circuit breaker assumes the supplier has recovered and transitions back to the CLOSED state (resetting its failure counters). Normal operation resumes.
  - If any trial request fails: The circuit breaker assumes the supplier is still unavailable and immediately transitions back to the OPEN state, restarting the recovery timeout.
Benefits:
- Prevents Cascading Failures: Protects upstream services from being dragged down by failing downstream dependencies.
- Fail Fast: Provides immediate feedback for operations likely to fail, preventing long waits and timeouts, improving user experience or system responsiveness.
- Allows Recovery: Gives failing downstream services breathing room to recover without being overwhelmed by continuous requests.
- Increased Resilience: Makes the overall system more robust and tolerant of partial failures.
Implementation:
- Often implemented using libraries within the client service (e.g., Resilience4j, Polly, formerly Hystrix).
- Can also be implemented in API Gateways or Service Mesh proxies (like Istio, Linkerd) that sit between services.
In an Interview:
- Understand the problem of cascading failures in distributed systems.
- Explain the purpose of the Circuit Breaker pattern (to prevent cascading failures and allow recovery).
- Describe the three states (Closed, Open, Half-Open) and how transitions occur.
- Discuss the benefits (fail fast, resilience, preventing overload).
- Suggest using this pattern when designing communication between services, especially if one service is known to be less reliable or prone to latency spikes.