Efficient enterprise testing — test frameworks (5/6)

#testing tuesday, october 01, 2019

This part of the article series will cover test frameworks and my thoughts and experiences on when and whether to apply them.

Thoughts on test frameworks

The reason why I’m not too excited about most test framework is that, from my view, they mostly add some syntactical sugar and conveniences, but per se don’t solve the problem of having maintainable test code. In other words, if your test suite is not maintainable without specific test technology, it hardly will improve by simply introducing another test framework.

I claim, the biggest impact in having readable test code is introduced by crafting test code APIs and components with proper abstraction and delegation. This doesn’t depend on any technology but is done in plain Java, in test cases that can be executed by JUnit. For verifying the specific steps, AssertJ has proven itself well. We can define custom assertions that are specific to our business logic, which further increases the readability of our code. If the test cases need to mock classes that are out of scope, Mockito does an excellent job at this.

I claim these test technologies are already sufficient. Especially the advent of JUnit 5 introduced further enhancements how to setup dynamic or parameterized test suites.

Still, there are some test frameworks that are worth looking into. I’m not against introducing further test technology at all, since they certainly can improve the readability and efficiency during testing. However, I claim that paying attention to test code quality is crucial, additional test frameworks optional.

Spock is a testing framework that comes with a Groovy API and that is fairly well-known and used in projects, with the goal to increase readability and maintainability. However, I’d still ask the question how much benefit this technology adds. If developers are happy with its syntax and approach, fine; but if the project is otherwise purely written Java, it might be more effort to manage and configure the additional dependency, compared with the benefits it provides. From experience, we spend quite some time configuring Groovy and it’s versions on all development machines as well as on the CI/CD environment, as well as configuring the Maven builds. Since I claim the biggest return on investment comes from test code quality, regardless of the technology being used, the actual benefits of having a framework such as Spock in complex projects are rather small.

Testcontainers is a technology to setup and manage Docker containers during the test life cycle. It enables developers to orchestrate a local test environment, that may include the application-under-test, external systems, mock servers, or databases. The open-source project uses the Java wrapper for Docker under the hood and binds the container life cycles to the test runtime.

While this approach can be very convenient to define the whole environment within our test cases and reduce the management to a single entry point, that is the execution of the Java test class, I usually advocate against coupling the test scenarios to the test environment life cycle. Restarting and redeploying the local test environment at every test case takes too much time and reduces the immediate feedback. To minimize the whole turnaround, developers should keep a local environment running for a long time and run the idempotent test scenarios against that environment. It’s easier to manage that setup if the test cases don’t fiddle with the life cycle. In the future, it’ll be possible with Testcontainers to keep the declared containers running beyond the test cases. However, defining the life cycle externally, via shell scripts, Docker compose, or Kubernetes, is in my opinion clearer and easier to define, without the use of another abstraction. We’ve had some minor issues with the Docker Java wrapper in the past, for example when the format of the config JSON file changed. The advantages of abstractions such as wrapping tools into Java APIs, are from my view often not very big, yet they come with a certain effort in configuration and maintenance, and we often ended up building workarounds around their limitations.

For this reason, I still consider it the simplest solution to setup local test environments using (bash) scripts or similar approaches that are executed separately. The responsibility of managing the environment, setup and tear-down, is thus clearly defined; the tests scenarios only use and verify the local environment and can run instantly. Using shell scripts or technology such as Docker Compose in a direct way might not be that fancy but compared to how much time you can spend with (Java-based) abstraction, it’s actually much faster to define compared with managing dependencies, configuring runtimes, and integrating life cycles. Ideally, we define a single action that sets up our local environment during our development session. Our CI/CD pipeline can use a similar approach, or it might use a more complex setup, such as deploying our applications to a Kubernetes cluster, anyway.

Another benefit of using plain technology to run our tests is that typically the test scenarios can then easily be re-used for other test scopes. For example, when we’re using JAX-RS client, instead of Restassured, to connect to our application within our test scenarios, we can easily extract these scenarios and reuse the code to drive performance or stress tests. The same is true when we define test scenarios that are valid for multiple test scopes, by simply swapping out some lower-level components. The more the test framework modifies and influences the test life cycle, the harder that reuse becomes. In general, I’m advocating for separating the concerns for the test life cycle, the scenarios, and the implementation of individual steps within the scenarios.

One technology that does makes it easy to reuse test scenarios in multiple scopes is Cucumber. I like the approach to define the scenarios in a very abstract way and to implement the execution separately. The test cases are defined with Gherkin, in human language, ideally from a pure business perspective without technical leaks; the implementations of the test cases can be swapped. This somewhat forces a cut between these layers. In some projects it has proven to use the Gherkin format in Cucumber tests to communicate with business domain experts or folks who have less or none programming experience. On the contrary, I have also seen domain experts and QA engineers who were absolutely fine with reading Java code, if the test scenario methods were short and very expressive in what they are testing. The more we are clear on the naming of methods and internal APIs, the more others can read the code like prose. This experience affirmed the idea that additional technology on top of properly crafted Java code is not necessarily required.

In general, the more complex a project grows, the smaller the impact of test technology on productivity, readability, and maintainability, and the more important it becomes, that we care about test code quality, properly crafted abstraction layers, and separation of concerns. If developers want to use additional technology on top of that, that’s fine, but we need to be aware of the trade-offs, e.g. how much time it takes to configure an alternative JVM language, it’s dependencies and versions, and the additional weight of having yet another technology in our stack, compared to using syntactical sugar on some layers. The readability and maintainability comes from crafting proper abstraction layers, separating concerns, and naming. The clarity what went wrong when assertions fail comes mostly from the assertion technology, e.g. AssertJ, which does a great job in providing which assertion failed for what reason, given the developers did a decent job in writing the assertions in the first place.

This is what I see often underrepresented, if you watch tutorials, or demos in presentations about testing; if we look at simple, hello-world-like examples, the importance of proper test code quality and structure might not appear self-evident immediately while the added syntactical sugar looks like a huge gain on a small scenario.

The next and last part of this series will briefly cover additional end-to-end tests.

Found the post useful? Subscribe to my newsletter for more free content, tips and tricks on IT & Java: