Contact Us

Machine Learning for catching attackers

Dec 11, 2017 6:16:00 PM

Hackers use smart tools these days. There is a growing breed of attacks that routinely bypass the web application firewall (WAF), the first line of defence at most internet sites. These attacks appear legitimate to a WAF. So how do you catch a sophisticated web attacker posing as a legitimate client? One answer is to look for inconsistencies in their story.

Each web, mobile, or API transaction tells a story, with the details buried in the many layers of the transaction. A legitimate actor’s many examinable aspects and attributes will all be consistent. A malicious actor, however, will inevitably fall short somewhere when put under enough scrutiny. Like a seasoned bouncer at a nightclub, if you know where to look you can often spot the teenagers with fake IDs. 

This requires analysing many details of the presented story, and exhaustive comparisons across all available dimensions. This can be arduous when using traditional methods.

At Stealth Security we tackle this by employing algorithms that leverage Machine Learning (ML). We generate a range of ML models that are trained with a diverse collection of web, mobile, and API requests. A simplified explanation of the underlying mechanism is that the ML models are trained to recognise patterns. Stronger than this, the models are designed to inductively learn and to reason. The huge advantage here is the generalisability and scalability of the approach. As our models are incrementally exposed to more traffic in the field they strengthen their reasoning abilities. Each attribute is then automatically cross-examined by the models using logic that has been learned and extrapolated from exposure to all previous traffic.

The ultimate benefit comes when the models make accurate predictions of what a client really is (not what they superficially advertise) without ever having seen this client before. While this may appear remarkable, and counter-intuitive in terms of traditional algorithms, what is happening is akin to a child telling you with confidence that the neighbour’s new pet is not only a dog, but a Labrador, and by the way she’s a puppy. The child has never met the dog before, but has been exposed to many examples of dogs, and not-dogs, and has internally generalised the many features of animals. Our models do the same with internet traffic.

This post was a quick introduction to the concepts behind some of our machine learning models. In the next post we will walk through a use-case to reveal what is happening under the bonnet.

Seiji Armstrong

Written by Seiji Armstrong

Principal Data Scientist