Blog / Machine learning, Ravelin product

How Ravelin selects & engineers ML features to detect and prevent fraud

Every business has data, but not all of it is relevant for fraud. Here's how Ravelin selects and engineers ML features for fraud detection recommendations.

What is a feature in machine learning?

Features are the input to a machine learning (ML) model. In other words, it's the data a ML model uses to generate a prediction – and it can range from simple to very complex.

At a basic level, a feature can be an individual measurable property or characteristic, such as the cost of a transaction. But it can also be much more complicated – a measurable property considered within a certain time frame only, or a different combination of these data points.

Feature engineering is the process of extracting these meaningful characteristics to use as learning material for the ML model, which is being trained to give us fraud recommendations.

ML features at Ravelin are grouped into feature megafamilies, which share some characteristics and help provide transparency for fraud recommendations and for each client's fraudscape.

How are ML features engineered at Ravelin?

Ravelin uses a continuous integration and deployment philosophy to build machine learning models at scale to support all our ML-enabled solutions, including payment fraud and refund abuse prevention. These ML models are built on features and feature megafamilies.

For Ravelin, every company is different, even within the same sector. So we believe in building a dedicated, custom model for each merchant rather than one for everyone – or grouping them together, for example, all retailers, or all travel websites. Our ten-year experience in fraud detection and prevention has shown that the deployment of individual ML models brings the best possible results.

In practice, it is a multi-model approach: There are different models for each Ravelin client, aligned with their industry, fraud appetite, KPIs and strategy. When a new model is prepared for a Ravelin merchant, it competes against the existing model, with the best performer of the two deployed.

Types of features at Ravelin

We look for features to capture certain characteristics of a customer or user that help us predict fraud. Here are some examples of the types of features we use to build ML models:

Traditional features

These are the typical aspects that predict fraud. For example, orders, transactions, cards, location, email.

These features generally cover the data you would expect to find on your receipt and are customer-centric. They're are also fairly simple, often relying on one fraud signal or data point.

Behavioral features

We also derive behavioral features from the customer session. These are features are based on describing the customer actions, eg. velocity of orders, time spent on the page, length of time between adding a new card and making an order. This type of feature is more complex to engineer, as it combines more than one data point.

One purpose of extracting these features is to capture subversive technology use. For example, we can tell whether a fraudster is using a script to scrape a webpage instead of browsing like a legitimate, everyday shopper.

Network-derived features (Connect features)

Network-derived features focus on link analysis network topology (network shape) as a means of enhancing our customer data and understanding what data points are shared by more than one customer, and which configurations of these are normal or expected – as well as which are not.

For example, a family who live in the same house might share accounts, and that is perfectly reasonable. But when you have a case of account takeover activity, you might see hundreds of accounts accessing a website from just a handful of devices, if not just a single device.

Individual customer features

Ravelin's solutions for fraud prevention focus on people, not transactions. This allows us to zoom out and consider the bigger picture. We don't home in on the data we have around one specific transaction only, but on what we know about the customer themselves.

The individual customer ML features tell us how consistent current behavior is with this customer's typical past behavior. This could be their typical spend, regular billing address, home IP address, etc.

For instance, the ML model will spot whether a customer who usually makes very small purchases is suddenly looking to buy ten big-ticket items in one go. It's not an indication of fraud per se, but once considered along with all the other features, it can mean something is amiss.

Real-time features

Real-time features are based on the up-to-date, real-world incidences of fraud. These types of features are all based on categorical data and give the real-time rate of fraud by category. For instance, country, ASN card digits, email domain and so on, such as the average fraud rate in certain regions/countries.

One purpose of these features is to support the secure growth of merchants looking to expand into new markets, for which they have no historical data of their own. By monitoring the real-time traffic, these companies can seamlessly move into new markets, without seeing any adverse effects from the machine learning models, such as bias.

Session-tracking features

These features are a little more involved than the behavioral features we looked at above.

Session-tracking features cover the data we get from Javascript and can include:

whether the customer is pasting a card number into the checkout rather than typing it
their browser cookies
whether the customer is using a password vault
etc.

One purpose of these features is to be able to identify genuine customer behavior. For instance, a legitimate customer will normally take time to select the size of an item of clothing or look at the sizing chart, rather than do it instantly. If this is done instantly, there is perhaps a script running on that page – and legitimate customers don't run scripts.

Entity features

We also divide features into customer-centric and entity-centric. Entities are things like devices, addresses, locations, domains and emails.

An example feature is the number of orders shipped to a certain address.

One purpose of these features is to alert us to a fraud goods drop-off point or help generate heat maps of fraudulent activity.

Explore how these models translate into results for our merchants – and what they can do for you. Book a call today.

Authors

Jono MacDougall CTO

A Physics graduate hailing from Canada, Jono has built a career defined by technical variety and complex engineering challenges. His journey as…