Automated GUI testing will kill you.

An Evolution of Automated Testing Philosophy

This is an evolution that I have seen many people and organizations go through. A progression and learning process on how to write automated tests and where to write them.

Assumptions

I assume that you have someone other than the customer testing something. A QA person, a developer that doesn’t suck at testing (these are rare)…just someone.

I am talking mostly about web applications. Some of this will apply to desktop application testing, but that sort of automated testing just sucks in general.

A lot of what you can do regarding automation will depend on your tester.

“QA” means quality assurance.

Your tester is male because “he” is shorter to type.

“He” refers to the tester.

“You” might be the developer, company, or tester.

You have a good architecture for your app. There is little if any business logic in your presentation layer. I assume this because if that is where your business logic is, then the GUI is probably where you need to test. Just know that it will be hard for many reasons that I will not go into here.

I am not talking about legacy applications. Testing those can be a very different experience all together.

You and the QA person know how to write a good test. There are a million assumptions here. There are so many more problems that you can give yourself with poor architecture, poor coding practices, lack of hooks for testing, etc.

The Levels of Automated Testing Evolution

Level 0

You start writing an application. You have a QA guy that was hired as a QA person because they were no good at coding. They failed your developer interview so you told them to test your code. There is a place for these people but not in the scope of this article.

Level 1 — Record and Playback

You start writing your application after getting back from hiring a QA guy that can write some code. You have an engineer now that could develop along side you but chooses to focus on testing.

He starts by clicking through the application. Logs some bugs.

Some stuff gets changed. He has to click through it again, and decides enough is enough. It is time to automate some of this repetitive clicking — he is right.

Now there is a choice to make. Which tool is the right tool? Most people will settle on Selenium. It is pretty much an industry standard for GUI testing of a web application. Plus, there is a recorder that plugs right into Firefox.

After a short while, you will have built up quite a large suite of tests that were super simple to create. This works great until you change something.

The great weakness of these tests is that they are just text files. There are no functions, no reusable code, no way to maintain them. If you change something in the GUI, you will have to re-record many of your scripts or do search and replace through directories of text files using sed or awk or perl.

You will quickly realize that this is not a sustainable approach and begin progressing to level 2. Most people spend very little time at level 1.

It is true that there is a place for these kinds of tests. I have used the selenium IDE to do some simple browser automation. We had a huge form that was under heavy development. All I needed to do was populate the form over and over. I fully intended to throw away the test once it shipped and I was crunched for time. However, you could argue that I could have saved time in the long run by choosing a different automation strategy.

Level 2 — Driving the Browser

Selenium WebDriver allows you to drive the browser from a lower level. Different browsers require different additions to make them go. You can write in pretty much any language you want. You could just as well use Watir, or some other tool to control the browser. When I got here, I went with Watir.

I had solved many of the problems encountered in level 1. I had reusable code, I created a nice page-object model before I even knew such a thing existed. I built a nice suite a GUI tests that covered the application well.

Now that I had a great suite of tests, I needed to run them and get the results. Most testers at this point have their tests in their own domain. Dev completes a feature, QA runs the tests that they have.

Here is how that goes:

QA: Hey dev, are you done with that feature?
Dev: Yeah. 
QA:  Did you follow the spec?
Dev: Sort of.
QA: How am I supposed to know if it works?
Dev: ….
QA: Does it work?   
Dev: Of course.
QA: How do you know?
Dev: …

…but that is another story.

So the QA guy tests the new feature. He and the dev get all the bugs worked out. The QA begins his regression. He runs his automated tests, nurses them through, looks at the output, and arrives at a point of feeling generally confident about the release. He announces that the release is ready.

Problems at this level

On the surface, this scenario sounds great. A GUI test gives a great bang for your buck since it tests all layers of the application at once. The QA guy ran the tests and the code got shipped.

Some problems will inevitably arise, if they have not already at this point. As your suite of tests grow, you will find they require more and more maintenance. Running the tests will require more and more time to just make it through the tests.

Now would be a good time to side track on the fundamental flaws of the GUI test. They include: flakiness, slowness, brittleness. There are more.

As you run larger and larger suites of tests, the false negatives from randomly failing tests will create a picture of quality that grows increasingly cloudy. You will begin to feel the weight of the test suite bearing down on you. You may say, “we don’t have this problem. We keep our tests well maintained”. Ask yourself how much time you are really spending maintaining those tests and if it is worth it. Ask yourself, “Is there a better way?”

Another problem is the running of the tests, the knowledge of how they run, the quirks, and what constitutes a good result is wholly contained within the QA guy. He may know a test fails but it is not a problem.

A potential problem is that the results are produced once when the tests are run and not archived or stored in any way. There is no way to go back to see what happened with any particular release. To be fair, the inability to produce any data for anyone, including management, (who may be questioning the value of the time you spend maintaining your suite of tests) is a problem that can occur at many levels of maturity.

Level 3 — The framework

As the problems of level 2 progress and make themselves apparent, the natural tendency is to try to fix them. The temptation is there to build a framework around the tests to help manage the problems.

The framework will ultimately contain a lot of things. One big component is the ability to run the tests and report the results. This is an attempt to solve the problem of having the QA run the tests in isolation, and the reporting of the results.

The framework will also attempt to abstract information away from any specific application. This is inherently difficult to do, especially if you only have one consumer which is yourself. But, as an engineer, it is the responsible thing to do.

The problems

It is arguable that you did solve some problems. Potentially anyone can run the tests. The QA guy can be on vacation and someone else can still run them. Alas, this is probably a false hope. Your GUI tests will fail randomly and only the QA guy knows which tests are “OK” if they fail. You can produce a nice chart that you can give to management. However,this approach will fail to solve any of the fundamental problems inherent with GUI tests (to be fair, you probably never claimed to solve the problems of a GUI test). One of the biggest problems you have just given yourself is that you now have a framework to maintain on top of your tests. The question of whether or not you are actually saving money by testing gets harder and harder to answer in the affirmative.
This is where people (read: management and the people who make budgets) start to question you. Why are we spending so much money to run the tests? Why does it take you so long to do a regression? I thought your tests were automated? This chart you gave me has 10% failing tests? Why did we ship?

This level can end in a couple ways. You are crushed by the weight of your own flaky tests, or the designer re-skins the app and breaks all your tests.

Level 4

There is no way out of level 3 that allows you to “salvage” the work you have done. It is gone. More trouble than it is worth. Forget it…most of it.

It is time for a clean start. So where do you start? You start where you should have started in the first place. A place that is counter intuitive at first glance to the efficient tester. After all, a GUI test tests everything all at once. The whole stack of code. It is the best place…except for all the problems we just discussed.

Where you need to start is at the bottom. You need a nice layered approach to your automated tests. Many, many, many unit tests. Many integration tests running at the service level. A few tests at the GUI level.

The goal of testing is not and never has been to make sure everything works. The goal is to reach a level of confidence in your software that you are comfortable with, and then ship it. You want a quality product which includes so much more than just being free of bugs and errors, and you want to get it as cheaply as possible.

Why this is a good idea

As developers write unit tests, they will be forced to think more carefully about how they coded something. Simply the writing of tests will produce better code and fewer bugs. As you write your tests at the service level or some level lower than the GUI you will end up with tests that run very quickly and are very robust. Additionally, in the age of javascript front ends you cannot rely only on what you see in the browser. You have to know what data is being returned and not shown (and if it is a problem or security risk). Services may change, but they will change much less frequently and less drastically than the front end. As such, your tests will work well, and work for a long time.

Put your tests with the code

Code your tests right next to the code — in the same language if possible. If you or any developer can run OR write your integration level tests right along side the unit tests the likelihood of doing so will go up. If your tests are run, bugs will be found before they ever get to QA saving time and money. You solve the problem of only the QA guy running the tests. You didn’t have to build a framework to run them. You can use the tried and true unit testing frameworks. There are many ways to get the results of the tests—especially if you are running them in some sort of continuous build environment.

So what about the GUI?

You still need some GUI tests. But you need very few. Keep all the random fails and flakey tests to a minimum. Do all the testing you can at the lowest level possible. Then check what is left and necessary to check at the GUI level.

I still believe that some GUI tests are necessary. But, (and this is a big but) you don't need to check everything.

Here are some rather simple examples of what I think is a appropriate GUI test to better illustrate my point.

GUI Only

You have a page that does client side validation of a form.

GUI/Services Tag Team

You can enter your name into a form and save it. Test this once in the GUI. Test the happy path. Then, at the service level, if your risk profile of the service warrants, write tests that check everything else. (long names, short names, fat names, skinny names, names that climb on rocks, even names with chicken pox, hot dogs, etc).

The End Result

Your development velocity will improve. You will build a sustainable suite of tests that run more quickly, report their data, and are decentralized from the QA organization. Your tests will run much more consistently with far fewer false negatives. Your confidence will improve and management will be able to more clearly see the value you are providing.

Level 5: Testing what matters

Once here, hopefully you have gained enough experience to know what not to test or what not to automate. This may be counter intuitive, but not everything needs automated.

Tests run when they need to run automatically.

Only the tests that need to be executed run.

Results are reported in ways that are useful to all stakeholders and interested parties.

Level 5 is a beautiful amalgamation of automation, configuration management and tests that run and work.

This list of the ideal could go on...

Conclusions

Don’t start at the GUI with your automated tests. Just resist the temptation. If someone suggests that you do, just slowly back out of the room and run.

You know your application, what it needs to be, and where it is going. You know what level of testing you need to provide. The basic principle here is that enough testing at the GUI level will crush you and that most automated testing can be accomplished in a much more efficient way. Unfortunately, this is true provided you have a lot of other things in place. Take a good look at where you can get the most value for your money. There is a place for automated GUI testing. That place should be as small as possible.