In automated testing for software, end to end tests have a bad reputation. They are slow, they are flaky. Ask any interview candidate about testing and they will blurt out something about “the test pyramid”. This oft-repeated assertion is that you should have a gazillion unit tests, a modest number of API tests, and very few end to end tests. There’s a pleasing logic to it, whereby the number of tests is inversely proportional to the size of the test. The trouble is, in my experience it doesn’t hold up in practice.
Lets briefly recap what each kind of test does. A unit test verifies that a self-contained piece of the code does what it should – generally a single method. An API test checks an API call and its corresponding slice of back-end functionality, probably including a database. In a microservice architecture the intention is generally to test one microservice. Finally an end to end test drives the UI in order to simulate user interaction and test the complete system.
The problem with unit tests
Clearly API and end to end tests do the best job of confirming that the system will actually work. The trouble with unit tests is that too often they are rather tautological in nature. If you have a
setFoo() method, chances are you have a
testSetFoo() test. This keeps your code coverage metrics high and gives you as false sense of confidence – after all you’ve got over 90% unit test coverage, right? If
setFoo() has no interesting business logic however, you’re merely setting a property and testing that the property was set, which has negligible value. Worse, if you change the implementation the test will start failing and need updating even if nothing is wrong. (Conversely, end to end tests are highly sensitive to UI layout updates, but don’t need to be modified when the inner implementation changes). Rather than aiming for such a high unit test coverage, I prefer to identify the methods which have some complexity to them, and just test those. To give one example, anything involving time and date manipulation generally has a lot of annoying edge cases which you should test.
Where are the weak points?
Whilst we’re thinking about what could go wrong, its worth highlighting the connections between components as a major source of problems. When a system has been built by different developers who did not quite understand the contract provided by other subsystems you have fertile ground for bugs to occur, much more so than a self-contained piece of code which was likely written by a single engineer. This shows the value in API and end to end tests. In fact, this principle extends beyond code and into the realm of configuration. In one analysis of real system failures I performed, I found that two thirds of outages were due to server and configuration problems, and only one third actual mistakes in code. If you want a highly reliable system you need to spend as much time focussing on the system configuration as you do on the code itself.
Performance matters for tests, too
Lets look at a couple of objections often leveled at end to end tests. The primary one is that they are slow, and this is certainly the case. We’ve all been in situations where a test suite took ages to run, which seriously impacts your development cycle time, or means that the tests will be taken out of the automated build pipeline. There are things we can do about this, though. To begin with, look at optimising your test runner. Why throw out nine tenths of your most useful tests when its sometimes possible to achieve a tenfold speed up in test run times just by optimising the way fixtures are built? Test code does not have to be pretty, and that probably means no-one has ever tried to make it efficient either.
The other way you can speed things up is to not run all the tests for every change. This requires some kind of analysis of what you do need to run. If you employ micro frontends this is quite easy – you want to run the end to end tests for one particular part of the UI, just as with microservices you can run the unit tests just for one microservice. In any case you should always run a smoke test suite which contains a few basic tests that verify the system is essentially healthy, e.g. is it possible to login and view the home screen. A few tests like this will detect a vast array of possible failures, including deployment problems.
Everything’s fine: hiding the bad news
Another common objection is that its harder to pinpoint the fault when an end to end test fails. That’s true, but I don’t think it’s a good reason not to have them. It’s like closing your eyes so that you won’t see anything which might upset you. If an end to end test fails but your unit tests are all green, then either a unit test could not have detected this particular issue, or you are missing a useful unit test somewhere. Both are valuable pieces of information, and whilst you will have an investigation ahead of you to detect the fault as long as you add in missing tests along the way it is time well spent. Think of it as a way of detecting your “unknown unknowns”.
Likewise if a test is flaky, it may very well be the test framework which is unpredictable – but it could also be the functionality itself. Look at your production logs and see what errors and warnings you’re getting. It’s possible users are hitting reload or putting up with minor glitches due to race conditions, and your end to end tests can alert you to this.
So, to sum up – be wary of the popular idea of the “test pyramid”, with a myriad of trivial unit tests and a handful of end to end tests. Instead, work out how many tests you actually need in each level based on what might go wrong. For end to end tests that generally means one per principal use case or user flow. You might find that a “test cylinder” is actually a better model for the distribution of your tests than a pyramid!