Overview
Deploying a new workflow version is scary: how do you know the new version performs better than the old one? After you’ve run a sanity test to rule out obvious problems, backtesting can help you test the workflow against a dataset to gain more confidence in the new version.
Prepare the dataset
A dataset contains historical data you’ll use to test a workflow version. You can either use executions or a CSV file as the dataset.
To be able to join the test results with the labels in your data warehouse, we suggest including an input feature that can serve as the join key. This input feature is usually an ID, such as an application ID. Make sure to add this feature as an input feature.
Backtesting with executions
To run backtests with executions, simply select a date range of executions. Sperta supports large-scale backtesting with hundreds of millions of executions.
This method supports workflows that contain data sources. However, an execution will be skipped if it doesn’t contain the data source response the current backtest needs.
In the following example, if a new workflow version approves more applications in the Knockout Rules
stage, the backtest for this version may skip the executions that declined the application in the Knockout Rules
stage since they don’t have the credit data. When this happens, the status
column in the backtesting result will be InvalidArgument
.
Backtesting with CSV
When you first start using Sperta, Sperta may not have enough executions. So, instead, you can upload your own dataset as CSV.
A CSV file looks like this:
email_domain,fraud_score,credit_score,age,past_due_amount
hotmail.com,0.2,721,35,100.0
gmail.com,0.6,675,45,12.34
yahoo.com,0.4,801,28,0.0
The first line of the CSV contains the input feature IDs separated by commas. Each following line represents the feature values of a sample, such as an application or transaction. We suggest exporting the CSV from your data warehouse, and there are a few details to pay attention to:
- For boolean values, we support the following formats:
true
,false
,TRUE
, andFALSE
.true
andfalse
are natively supported in the Sperta Expression Language.TRUE
andFALSE
are also supported since some spreadsheets software and data warehouses automatically convert boolean values to this format. - CSV doesn’t support complex feature types (
List
,Person
etc). You shouldn’t add them to the CSV even if the workflow contains such features. Sperta will automatically supply an empty list for List features during the backtest. - The maximum file size allowed is currently 1 MB. As a rule of thumb, a 1 MB CSV file can contain 30K rows with dozens of input features (columns).
Analyze the test result
After the backtest finishes, you can download the test result as a CSV file:
The results are appended as extra columns to the CSV, and it looks like this:
email_domain,fraud_score,credit_score,age,past_due_amount,status,decision,blocked_country,email_domain_in_block_list,loan_outcomes.apr,loan_outcomes.loan_amount
hotmail.com,0.2,721,35,0.0,OK,Approve,false,false,0.1,10000
gmail.com,0.6,675,45,12.34,OK,Approve,false,false,0.15,8000
yahoo.com,0.4,801,28,0.0,OK,Approve,false,false,0.1,10000
Specifically:
status
indicates if the current sample was successfully backtested. Just like workflow execution, errors could happen due to various reasons such as feature type mismatch. You should discard the sample if the status is notOK
.decision
is the decision of the workflow.- The columns after
decision
are outputs of the workflow. In this example, they areblocked_country
,email_domain_in_block_list
,loan_outcomes.apr
, andloan_outcomes.loan_amount
. - For input features, only features of primitive types (
- For input features, only features of primitive types (
Boolean
,String
,Integer
,Double
) are included, even if they’re available in executions.
You can import the CSV to your data warehouse or BI tools, join it with labels, and compute metrics such as precision, recall, approval rate, and delinquency rate.