News eBay uses fault injection via code instrumentation at the application level

eBay engineers have been using fault injection techniques to improve the reliability of the notification platform and explore its weaknesses. While fault injection is a common industry practice, eBay took a new approach using instrumentation to bring fault injection to the application level.
The platform is responsible for pushing platform notifications to third-party applications, providing the latest changes in commodity prices, commodity inventory status, payment status, etc. It is a large, highly distributed system that relies on many external dependencies, including distributed storage, message queues, push notification endpoints, and more.
Typically, eBay engineer Wei Chen said, fault injection occurs at the infrastructure level, such as causing network failures to introduce HTTP errors, such as server disconnects or timeouts, or making a given resource temporarily unavailable. This approach is expensive and has many effects on the rest of the system, making it difficult to explore the effects of failures in isolation.
But that’s not the only possible way, Chen said. Instead, faults can be created at the application level, for example, by adding specific delays in HTTP client libraries to simulate timeouts.
We instrumented the class files of the client libraries that depended on the service to introduce the different types of failures we defined. Introduced errors are thrown when our service communicates with the underlying resource via the instrumentation API. Due to code changes, the failure didn’t actually happen in our dependent services, but the effect was simulated, allowing us to experiment without risk.
Three are the basic tools eBay implements to force calling methods to exhibit incorrect behavior: blocking or interrupting method logic, for example by throwing an exception; changing the state of a method, for example by changing response.getStatusCode()
; Replace the value of a method parameter, including modifying the parameter value sent to the method.
To implement the above three types of detection, we created a Java agent.In the proxy, we implement a
classloader
It will instrument the code of the methods exploited in the application code. We also create an annotation to indicate which method will be instrumented and put the instrumentation logic in the annotated method.
Additionally, eBay engineers implemented a configuration management system to dynamically change how fault injection behaves at runtime. In particular, for each endpoint supported by the eBay application, engineers can change numbers or parameters to test specific behavior.
According to Chen, eBay is the first organization in the industry to use code inspection for fault injection at the application level. If you are interested in this method, don’t miss the full instructions provided in the original article.