September 14, 2020

ML Testing: How I Built an ML Algorithm to Improve Test Automation


A few years back, I was part of a team that transitioned from the standard software development model of waterfall into the new and very fresh Agile methodology. One of the core principles of Agile is the short release cycle.

"Deliver working software frequently, from a couple of weeks to a couple of months, with a preference for the shorter timescale."

We were all in and started implementing short development cycles and frequent daily releases. We used the latest CI/CD tools. From the moment a developer announced the feature as done, it was deployed to production in less than an hour.

But we were facing a problem, a huge one which we did not see coming.

Agile Created a Testing Problem

Let's assume that we’re looking at an app of about 1,000,000 lines of code, and a developer adds a new feature that consists of an additional 1,000 lines of code.

Basically, we’re looking at an additional ~0.1% from the entire codebase.

But in order to validate that nothing got broken, the team needs to go over the entire 100% of the application.

Testing Slowed Development

This means that the time that should be invested during the testing phase is way bigger than within the actual development, and that obviously translates into the need for automation. Automation should cover a large portion of the application to meet the desire for such short cycles.

The existing tests were not sufficient. So, we started to invest more time in writing all kinds of tests: unit tests, integration, E2E, and many more. Fairly quickly, the team got to a point where it invested around 40% of the time in writing test automation. That was very inefficient.

Tests Began to Break

After a while, the team started tackling a second problem, and that was around tests that began to break. Nothing was broken in the app, but the tests were fragile. As the team added more tests to their suite, even a low percentage of flakiness could paralyze the entire development process. About 75% of the time was wasted on supporting the short cycle approach, and it felt like chasing our own tail.

So I Started a Company

A few months later, I took the opportunity and started my own company — a SaaS codeless test automation testing platform, which was acquired a few years later (now Perfecto Scriptless). I started it because I believed there was a real need in the market, based on what I personally experienced.

Today, Perfecto Scriptless is focused on solving two problems to help ease the transition from manual testing to test automation:

  • Fast and easy creation of test automation without the need to code.
  • Reducing maintenance by using ML to automatically heal test scenarios.

How We Applied ML to Testing: Lots of High-Quality Data

It was apparent that we needed to solve the test maintainability problem somehow. Machine learning was an obvious candidate as part of the solution, but we faced the paradox of ML. For machines to learn, they need LOTS of data.

We had no data when Perfecto Scriptless was in its early stages, so how could we leverage ML to our advantage?

Data Problem 1. Machine Learning in Test Automation Must Know a User’s Action

Fast and easy test automation creation was one of the problems and objectives we worked to solve.

One of the methods in which a user can add a test step is to model it against a real live web application. To make it happen, one needs to make a sequence of two actions:

  • Select an element.
  • Run a specific action on the selected element.

A few examples to clarify the above:

  • The user selects a button and chooses to click on it.
  • The user selects a date-picker and specifies a future date.
  • The user selects a drop-down menu and chooses a specific option.

From the basic examples above, you can understand that different elements have different sets of actions.

In TestCraft, after a user specifies an element, an overlay with different actions pops up based on the type of component selected.

google home page showing TestCraft product interface
Action suggestion for a button in TestCraft (now Perfecto Scriptless).

We needed an algorithm that could understand what type of element the user selected. We had to deal with a lot of different element types and countless HTML implementations, thus making it a very complex problem.

Solution: Sit Before a Computer & Gather Data

For this problem, we used a DIY approach to gather the data. DIY means to sit in front of a computer for hours and to scrape the data on your own.

We created a list of a few hundred websites: S&P 500 companies, small ones, local shops, banks, insurance companies, supermarkets, and more. Next, we went one by one and got out all the relevant information for different element types and placed them into one huge Excel file. In addition, when we started approaching customers (an arduous task on its own), we added their website to the list and optimized the algorithm accordingly.

This approach may sound very dull and conventional, but it is potent and cheap. If you are considering writing your own ML algorithm, and need data to make it work — keep that in mind. There is no replacement for DIY at the beginning. As an algorithm developer, you will learn so much about the process, about how you can optimize it, and about what it’s missing.

Data Problem 2. Adapting to App Changes

Reducing maintainability was the second problem we set to solve.

One of the most problematic tasks in writing an end to end test, while working in an Agile methodology, is making sure that the test will be valid after 10 or 20 cycles of development.

A single end to end test consists of multiple steps. Each step usually involves some kind of UI element and some user action or a validation. Most of the breakage in those kinds of tests is due to changes in the UI. Those changes cause a false identification of the element. This false identification is often because the test tries to identify a specific element in a way that is no longer relevant.

We started by sampling the data, gathered 100,000 test steps, and analyzed the percentage of change from one test to the other for each attribute of HTML elements. Here are a few interesting examples:

The position of an element from one test to the next in our sample changed 1 in 5 times.

The ID attribute which is considered very robust changed 1 in 6 consecutive tests, and only appeared 40% of the time.

Aria-label attribute is very robust and changed 1 in 20 tests but was on 1 of 10 elements.

Examples of Different Attributes and Frequency of Changes

Attribute Appearances Changes 
size 100% 3% 
position 100% 21% 
text 50% 17% 
id 39% 15% 
name 34% 14% 
aria-label 10% 5% 
placeholder 7% 4% 


What we’ve learned was that there isn’t a one size that fits all when identifying software elements, and that it varies on a case-by-case basis and per application. What it meant for us was that to improve maintainability, we needed to have deep familiarity with each and every component per application. This doesn't sound reasonable.

We decided to use ML to help us tackle that challenge and go with the approach of “get the help from your users.”

Solution: Get Help From Our Users

We initially designed our solution that later became Perfecto Scriptless to query the user with a lot of questions regarding elements identification. If we were not 100% sure that the element was the correct element, we would ask the user to redefine it visually.

Bing website in Testcraft application asking users for input
Visual element identification recovery was achieved with help of TestCraft (now Perfecto Scriptless) users.


After few months and a growing number of users who used the platform, we were able to extract all the data from all the elements and put in place the first Scriptless element identification model.

This method is still in place, and every time the number of inputs needed by the user is reduced, the model is bothering him less and less while giving more value over time. This approach is where the magic happens.

It may take more time to develop and design, but this gives the developers a genuinely scalable approach for gathering data as they go. When building your own ML solution, always think about how you can leverage your users to help you with improving your dataset.

Data Problem 3. Confirming if ML Is Testing Accurately

After we solved the resiliency and identification problems, it was time to tackle the next challenge.

We have an ML model, and we want to improve it and increase its accuracy (reduce false negatives and false positives).

How do we know that our ML algorithm is doing a good job?

How can this be done?

Solution: Outsourcing

After a long time of thinking it through, we got to our third approach of gathering data — outsourcing.

It is straightforward to outsource these days. You can do it in-house, as we did, or by an external service that allows you to do so, such as Fiverr, Upwork, Amazon Mechanical Turk, and more.

We were looking at ways to reduce both false negatives and false positives.

We built a side application, one that gives a user the ability to see two different images of screenshots with highlighted elements on both. The user would need to select "Yes" in case both refer to the same item, or "No" in case they do not.

Since we extracted data from our existing ML algorithm of such elements, the probability of a mistake was quite high, and there were ones that we weren't sure about the output of the algorithm.

The whole data extraction process took us some time, and we were able to provide about 30,000 elements and screenshots as an input for our human classification process.

After the human "Yes"/"No" classification was done by our own workers as a side task, we had a great dataset of examples that the original algorithm was unsure about, and we could feed more certain cases to improve the model. This increased our accuracy level dramatically.

I believe that outsourcing is an excellent way of gathering data when you can simplify the process. In our case, we needed to write a complete side application to do so. If you can do that, and you do it right, you can scale your data gathering immensely.

Bottom Line

Machine learning in test automation is both our future and probably the present as well. It can be extremely powerful and help solve complicated problems in ways we didn't think were possible a few years back.

You can use it for your advantages, but you need to consider the specifics of what will make the model great and of value.

As highlighted in this article, the secret is to focus on gathering a lot of high-quality data. The recommend practice with machine learning algorithms is around gathering the data and organizing it in a way that it can support your objectives.

If you are looking to simplify the transition from manual to automation testing, experience Perfecto Scriptless to see machine learning in test automation for yourself.

Watch Perfecto Scriptless Demo  

Related Resources