QUICKSTART

VIANOPS Free Trial

Welcome to the VIANOPS free trial.

We hope you enjoy this tutorial as it walks you through our monitoring solution capabilities using an example model available in our free trial – a taxicab fare prediction model.

The tutorial showcases key capabilities of the VIANOPS platform that simplify the monitoring process and provide deep insight to understand feature drift, prediction drift, and model performance over custom defined periods of time, as compared with other features, and across different segments of data.

Please review both parts of this document to understand concepts, terminology and key platform capabilities.

Part 1 – Understanding the Terminology
Part 2 – Exploring Model Dashboard and Investigating Performance Drop

For additional information, please visit https://vianops.ai/ or check out our Docs – https://docs.vianops.ai

Part 1 Understanding the Terminology

Baseline & Target Windows

These are data frames used to compare the value distribution drift of an input feature or output prediction, or to compare model performance, and they are called the Baseline Window/Target Window.

Baseline window is the point of reference used to compare a feature’s value distribution or a model’s performance. Baseline windows can be different sets of data (training data, or prior time periods of production data.) In this tutorial, baseline window refers to prior time periods of production data. With VIANOPS, these periods can be any timeframe that matters to business, for example the  prior day, the same weekdays of last 3 weeks, prior week, prior month, prior quarter, etc.

Target window is the data frame being monitored and compared with the Baseline window. Target windows can be current day (last 24 hours), week to date (first day of the week up to current day,) month to date (first day of the month up to current day,) or other custom time segments.

Feature A feature is an input to a model that represents a measurable piece of data. For example, a feature for a taxi fare prediction can be Estimated Trip Distance, Pick-up Location, Destination Location, etc. A model typically has tens to thousands of features.
Prediction Prediction is the output of a model. It will be different based on model type. For example, if the model is a binary classification, the value would be 0 or 1; if it’s regression model, the prediction is a numeric value.
Drift Changes in the value distribution between the Target Window and Baseline Window for an input feature or output prediction, or the changes of model performance.
Feature drift Changes in the value distribution of a feature in a target window compared to the baseline window. For example, feature drift of Destination Locations this month comparing to the same month last year.
Overall feature drift Models have multiple features, and each feature is monitored independently for drift. Overall feature drift is the aggregation of drift from all features that are monitored in a policy. This represents overall drift at a policy level and is used to trigger alerts.
Prediction drift Changes in the value distribution of predictions made by the model between the target window and baseline window.
Performance drift Changes in the performance metric of a model over time. Performance metrics will be different based on model type. For example, for a regression model, performance metrics can be Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE). To better visualize data in the model dashboard, we use Negative MSE (NMSE) or Negative MAE (NMAE) on the model dashboard in this sample model.
Policy Set of rules that define a process to monitor and alert users as drift happens. VIANOPS supports feature drift policy, prediction drift policy and performance drift policy at this release. You can define multiple drift polices with different settings to monitor models from different dimensions. These include target and baseline windows, drift metrics, alert thresholds, schedules to run policies, selected segments, and selected features.
Segment A subsection of a data set used to narrow down the scope of focus and accelerate the ability to uncover patterns that may be occurring only in certain sections of the entire population but may have an impact on model behavior. For example, you can define a segment where destination location is ‘Manhattan’ or a segment where State is ‘CA’ and Age Group is ‘Senior.’  VIANOPS allows users to view and compare performance and drift across multiple segments at the same time.
Alert An informational flag that drift has reached a pre-defined threshold in a policy. Alert levels include Critical (severe and need immediate attention) and Warning (less than critical but may need attention.).
Value distribution The number of times a specific value fell into different bins (across a range of values) for a feature during a target or baseline window. For example, the estimated time for a taxicab ride may range from 5 minutes to 37 minutes for a specific target or baseline.  VIANOPS allows users to customize how they view a range of values that is easy to understand in a business context. For example, instead standardizing the grouping of values (<5, 5-10, 10-15, and so on) users can create custom groups to better view and expose patterns, such as <5, 5-15, 15-45, 45-60, and >60 minutes. By default, VIANOPS groups the values into 10 bins for continuous features. For categorical features, the categories are the bins.
The data set The NYC Taxi and Limousine Commission (TLC

Part 2 – Exploring Model Dashboard and Investigating Performance Drop

1. Launch the Model Dashboard of Taxi fare model where you can see a dashboard with different sections including Latest Performance, Prediction Metrics, Alert Summary, Alerts list, Policy list and Segment list. 

2. Check out the Latest Performance on the upper left corner and notice the drop in performance that occurred a few days earlier.

3. Change the date range selector on upper right corner to Last 30 Days to check whether any performance drop occurred in the past month as well. Change back to Last 7 Days.  

4. Explore the inference traffic in Prediction Metrics over time and note that the average daily traffic in the last 7 days is fairly consistent.  

Performance and Prediction Metrics for steps 2-4

5. Go to Alert Summary, check the distribution of alerts across 4 different perspectives: severity, policy type, policy, and segment. Mouse over the donut charts to see more details.
6. Go to Alerts list, sort the alerts by Severity to view all critical alerts and you will see two alerts from the MAE performance policy as well as several critical alerts in the Week-to-week drift w/segments policy (Policy Name column) and Williamsburg-Manhattan Segment (Segment column). Review the Target window and Baseline window to see the difference in date range.
7. Scroll down to the Policy List to see the 4 policies created for this model; Recent Runs provides users quick insight into whether any critical (red dot) or warning alert (orange dot) was triggered in the last runs. For example, you may notice the Week-to-week drift w/segments policy has many alerts in the recent runs, but Month-to-month drift policy has neither warnings nor critical alerts.
8. Click the MAE performance policy to explore model performance further.
9. Expand the Policy Information section to see the policy type is “performance drift” and it compares the current day’s performance to the prior day’s performance, and the threshold for alerts is set at 20 for warnings and 50 for critical alerts.
10. Hide Policy Information. Scroll down to the Percentage change in mae chart, click on the circle where you see the biggest drop in performance. This updates other charts on the page: the alert on the left becomes highlighted, the Performance Details table underneath displays the actual performance of the selected date and performance change compared to the prior day, and Williamsburg-Manhattan segment is highlighted.
11. On the Performance Trend chart, you can see the actual performance trend for all data as well as for the two segments – Williamsburg-Manhattan and Brooklyn-Manhattan. Notice Williamsburg-Manhattan has a significant performance drop while the performance is stable in the Brooklyn-Manhattan segment. The performance in the Williamsburg-Manhattan segment is significant enough to have an impact to All Data .
12. Now let’s find out what potentially caused performance drop. Scroll down to the Related Policies at the bottom and open the Week-to-week w/segment policy which has many critical alerts. (Note that other policies can also be accessed by clicking the Policies link on the left navigation bar of the Model Dashboard.)
13. In the Week-to-week drift w/segment policy, expand Policy Information, you can see this policy type is feature drift, the Target window is week to date, and the Baseline window is past 2 weeks.
14. Go to Overall Feature Drift chart, you can see the drift of Williamsburg-Manhattan segment continues to increase while Brooklyn-Manhattan segment and All Data keeps relatively flat.
15. Click the circle of Williamsburg-Manhattan segment which has a high drift value and note that the charts below are refreshed based on the selection to reflect the corresponding target and baseline window in this segment as well as the level of drift for each feature in this policy.
16. Click feature est_trip_distance, and view the Value Distribution chart which shows the value distribution comparison of the target (current week to date) and baseline (last 2 weeks) for this feature. You will notice the est_trip_distance in the target window shows the majority of trips were longer in distance while baseline window has a more equal distribution across short and long distances.
17. Scroll back to the Overall Feature Drift chart, click the circle on the Williamsburg-Manhattan segment for the day prior to when the big drift was first observed; select est_trip_distance feature again in the feature list, notice the Value Distribution chart shows a similar distribution for the feature between target and baseline window.
18. Select other circles for All Data or Brooklyn-Manhattan segment, repeat the same steps as above by selecting one feature in the list and view the comparison of the value distribution between target and baseline window to observe the difference.
19. Drill down further to understand when drift occurred, and identify factors that point to why, open the Day-to-day drift w/segment policy by clicking this policy in Related policies.
20. Notice the spike when drift first occurs; note the downward line the next day that indicates the value distribution of features continues with the new normal (i.e., the level of drift remained at the same high level as the prior day.)
21. Select a circle in the Overall Feature Drift and perform the same observations as you did in the Week-to-week drift w/segment policy; select a feature and check out the value distribution of this feature in the target window in comparison to baseline window.

Summary

Based on all the metrics we have observed, we know there was a significant change in the Williamsburg–Manhattan segment days ago, and the new pattern continues. A quick internet search reveals that a one week construction project began on the day drift was first observed, and the closure of the Williamsburg Bridge resulted in much longer trip distance and trip time from Williamsburg to Manhattan. As this is a temporary problem, you do not need to retrain your model and instead simply add a courtesy pop-up message on the taxi app to alert customers to higher travel times during the next few days due to construction.

We hope this tutorial was helpful. If you have any questions, please don’t hesitate to reach out! You can reach us at vianops@vian.ai