MLOps Insights – What to look for in an ML Model Registry

What’s in an ML Model Registry?

When technology providers talk about ML model registries, they describe them as “team collaboration spaces” or “model repositories,” but they are so much more. How can enterprises determine what they need and how they can truly benefit from an ML model registry?

 

Organizational Maturity on a Brand’s AI Journey

A robust collaboration space around ML model lifecycle management ensures that users, from data scientists to ML engineers to MLOps engineers, can work as a team to build more mature AI systems and initiatives. Collaborating around models from experimentation mode to live models in production, teams can capture the granular details and artifacts for each model. In these more structured environments, roles and responsibilities are well defined to enable better collaboration, clear hand-offs, and accountability.

Some enterprises may be just getting started on their AI journeys. With no formal organizational structure, data scientists moonlight as ML engineers or IT professionals double as MLOps engineers. In these cases, the purpose of an ML model registry might be more about creating a system of record. That means that no matter who is doing the work, there is a rock-solid, trustworthy repository to house critical artifacts, model history, and other information associated with each model, no matter the user.

Organizations that have started to put a few models into production will want to scale them into a practice that benefits business. An ML model registry can provide flexibility to handle everything from collaboration to system of record to leveraging scale.

 

Wherever your business is at in its AI journey, these are the key things to look for in an ML model repository:

  • Needs of team members and the organization are met.
  • Supports the way teams work and collaborate.
  • Brings trust, transparency and accountability, traceability and explainability to ML models moving into or currently running in production.

Vian ML Model Registry

As part of our end-to-end Vian MLOps Platform, we provide an enterprise-wide model repository to manage and maintain all ML models in one place, regardless of which tools were used to build the models. Our platform is open and flexible while many other offerings can only import and register models created in certain proprietary tools.

As organizations accelerate the pace at which they deploy models, it becomes increasingly important that it is easy for users to track specific versions of the models that performed best, and to ensure the transformers used to train them are available to be packaged with the model. Experiment management becomes critical as in-production models begin to drift, and need to be quickly retrained and redeployed. A repository serves as a single source of truth to drive continuous operations.

Our platform allows extreme flexibility for enterprises regardless of organizational maturity by providing an open-first approach. Platform capabilities are delivered in modules, each wrapped as a service and accessible through the intuitive UI or directly with APIs.

As an organization embarks on its AI journey and thinks about ML model lifecycle management, it can ramp up small numbers of models into production to enterprise scale. Those already familiar with the process are provided the rich features, easy access to model data, and automated workflows that help a range of stakeholders easily participate in the process and flexibility to bring more models to production in a both centralized and decentralized approaches.

Need a better way to inventory and manage models regardless of the tools used to build them? With the Vian ML Model Registry, we provide a common registry to manage any model, enable collaboration and communication between disparate teams, and provide the detailed inventory, model lineage, reproducibility and traceability to build trust in the organization. Want to see a demo? Request one here.

MLOps Insights – 4 Key Considerations when Building a Successful MLOps Strategy

Looking to bridge the gaps between data science, ML operations, and IT? Here are four key considerations, including the most significant potential pitfall of any MLOps strategy.

Machine learning (ML) helps companies leverage massive amounts of data to improve the customer experience, manage inventory, detect fraud, and use predictions to make a host of business decisions. But many companies struggle with Machine Learning Operations – or MLOps – i.e., bringing ML models into production where they can deliver business value.

In its July 2022 report on the state of AI/ML infrastructure, the AI Infrastructure Alliance noted that only 26% of teams surveyed were very satisfied with their current AI/ML infrastructure. It’s understandable – it is a highly fragmented ecosystem of tools and platforms. The process of bringing models to production is disjointed and spans disparate teams and tools that were not designed to work together. And 73% of respondents said it took 1-2+ years to realize the benefits of using AI/ML that outweighed the cost of their infrastructure, implementation, and resources.

Although there are many challenges in this process, there are simple steps you can take to help mitigate these roadblocks and begin to streamline the process of MLOps so you can more quickly deploy and maintain trustworthy models running in production.

Collaboration and communication are critical.

MLOps is organized and practiced differently in every organization, with various people who may fill roles at different steps in the process. In some companies, data scientists not only build and train models but also manage model deployment, monitoring, and retraining. In other organizations, data scientists will hand off models to an ML engineer or IT operations team member and return their focus to new model development. Without any knowledge transfer, there is a huge gap between the person who created the model and the person or team that needs to get it into production and know how to retrain it when its performance decays. What was the model intended to do, how was it trained, what data set was used, and what are its key features? Knowing this information is critical for model lineage, reproducibility, and explainability.

Using a model repository provides an easy way for a data scientist to upload a model that can be accessed and downloaded by another resource who can then view the model and its metadata, artifacts, and version information to identify what may be missing. A repository makes it easy for other team members to maintain new versions of a model when it is retrained. And having this single source of truth where models are stored across the enterprise not only makes it easy to know how many models you have, but also helps mitigate duplication of effort across teams working on similar projects, and can drive further knowledge sharing and collaboration on projects.

MLOps is different from DevOps.

Most IT organizations are well versed in DevOps – the set of practices that integrate software development with the operational components of testing and deploying software. It leverages automation and the iterative processes of continuous integration and continuous delivery (CI CD) to drive collaboration, improve software quality, and provide more users with more frequent and faster releases to users.

While DevOps is mature, MLOps is still evolving and leverages many principles from DevOps to streamline the ML model lifecycle. But unlike traditional software applications that are deployed as executables and remain relatively static once in production, models are highly dependent on data. Their performance will likely change when they begin to ingest data from the real world because this new data, or ‘ground truth,’ is often different from the data on which they were trained. ML teams can keep models more reliable and accurate by implementing continuous training – the process of automating model retraining with new data.

In many companies, IT ops resources are expected to take on responsibilities in MLOps; while the processes are similar, there are significant differences. The ability to communicate with other stakeholders across this process and use tools that automate significant steps will help to make the process more seamless.

No single tool or platform does everything perfectly.

Many companies have a combination of tools, platforms, and home-grown solutions that they piece together to bring ML models into production. But not everything works well together, and if there are a lot of teams working on different ML projects, it can become cumbersome and difficult to streamline the process.

Often, it’s hard to cut through the marketing hype to really understand what each tool does and, more importantly, what it doesn’t. The AIIA recommends using one or two core platforms for data processing, pipeline versioning, experiment tracking, and deployment – these could be products that were built in-house or packaged solutions. Then, consider adding tools that are best-of-breed for your needs, particularly for monitoring, observability, and explainability.

The hard part doesn’t start until models are running in production environments.

Although the road to model deployment may be difficult, the real work begins once models are deployed and are being fed data from the real world. Too often, people don’t have access to the key data they need to make critical decisions about a model’s performance, and when or whether to take it out of production, retrain it, and redeploy it. This is the biggest potential pitfall for most enterprises, as most assume that models will behave the same in production as they do in training environments.

Model data can differ significantly from the data a model was trained on for a number of reasons. For example, a model may start to incorrectly predict that fraud won’t occur in previously well-understood situations because criminals have found a new way of spoofing a phone number or location. Or there could be a significant shift in buying patterns the model did not account for, or perhaps the data used to train the model excluded key features.

Teams need access to tools that monitor model performance, set thresholds for performance deviation, and alert them as soon as thresholds are met. And they need to provide insight into what happened, why it happened, and how to fix it. Ideally, your monitoring solution will kick off a workflow that manages this process, including the retraining and redeployment of your models. Automating this process is the fastest, easiest way to mitigate model down time and achieve continuous model operations.

Successful monitoring is about mitigating risk, and it requires collaboration across the disparate teams involved in the MLOps process. The data engineers and data scientists want to provide the business with models that are based on the right data and ensure they can be quickly retrained when performance decays. Business users need to have trust that models will make accurate and reliable predictions, and for highly regulated industries, they need to ensure explainability and auditability for compliance.

For more insights into monitoring, read our recent blog detailing Seven Critical Capabilities to Look for in an ML Model Monitoring Solution.

A unified strategy, with a modular approach to MLOps provides the foundation for success.

The Vian H+AI MLOps Platform was built to help companies accelerate ML models into production, monitor their performance to ensure trust and reliability, and enable continuous, low-touch or no-touch operations with automated model retraining and redeployment. The Vian MLOps Platform offers:

  • A simplified, no code/low code interface to make it easy for all stakeholders to import a model, explore and validate data, retrain, deploy, and monitor models with the most comprehensive risk assessments available.
  • API access to all functionality across the platform for users who want to access capabilities without using another interface.
  • A rich model repository that serves as the single source of truth for models across the enterprise, with an intuitive UX that provides access to all model details including versioning, data sets, and complete lineage.
  • Unparalleled model performance optimization (execution speed and throughput) to run ML models on commodity hardware, including informing users when optimization can help reduce run-time costs.

As companies accelerate their adoption of AI and ML, the Vian MLOps Platform helps them quickly realize business value while eliminating bottlenecks and streamlining the process of operationalizing ML.

Bridge the gaps between Data Science, ML Operations, and IT with a comprehensive, but modular platform to onboard, optimize and monitor ML models at scale.
Want to learn more? Contact us.

MLOps Insights – 7 critical capabilities to look for in an ML Model Monitoring solution

Are ML Models Increasing Risks and Vulnerabilities for your Business, or Delivering Trusted Insights that Drive Transformational Decision-Making? 

There are a lot of Machine Learning (ML) monitoring tools on the market today, and it’s difficult to tell them apart, make decisions about which one is right for your needs, and then seamlessly implement those tools to start seeing the value. When considering ML Model Monitoring tools, here are a few critical areas to consider. 

1.      Drift – When looking at ML models in production, drift refers to feature, prediction, and performance drift. Drift alerts and drift detection identify issues such as – has something changed in the underlying assumptions that are causing the model to drift in the wrong direction? 

2.      Uncertainty Analysis & Outliers – Uncertainty Analysis in ML models is about confidence, or lack thereof, in the data and feature accuracy within the model. Uncertainty Analysis capabilities within ML monitoring tools validate data pre-deployment and re-evaluate data as part of a monitoring plan, over time. 

3.      Bias & Fairness – These types of metrics measure when models make predictions that are systematically distorted due to incorrect assumptions about data, inherently inaccurate data, inadvertently excluded data, or spurious relationships in data. Fairness tests can be applied to check for bias during data pre-processing, model training, and post-processing results of the model’s algorithm. 

4.      Adversarial Analysis – This monitoring method is also a training method that feeds models deceptive data to try and trick them. It both generates and detects deceptive input to models and tracks if a mistake is made in its predictions. 

5.      Data Quality & Integrity – Quality data is accurate and reliable, while data integrity ensures data is complete, valid, and consistent. To ensure trustworthy models, data quality rules and automated monitoring capabilities can help teams identify poor quality data used in training, pipelines, or during transformations, and also automate data cleansing and other tasks. 

6.      Robustness, Stability & Sensitivity – Models are robust and stable when they make accurate predictions that do not change significantly under varying conditions, and they have high sensitivity when they perform well in accurately detecting positive instances – in other words, their True Positive Rate is high.

7.      Modularity – Just as important as the depth of monitoring capabilities within individual tools and solutions is whether the tools you are evaluating have the ability to integrate seamlessly into your landscape, and into your MLOps workflows. Tools that are based on open-source technologies, have the ability to provide a unified user experience across multiple tools and technologies to deliver robust monitoring and explainability (or can tuck into your UI if that is your preference), and are built for complex, highly customized landscapes and business needs, is critical. 

It’s just as important to know what is in a monitoring solution when starting a search, as it is to know what’s not included. When evaluating monitoring solutions, be sure to look for these seven key aspects, to avoid surprises later, especially on missing capabilities. ML model monitoring is a critical aspect of running trusted, explainable models that bring fair and unbiased intelligence to empower decision-makers. Models that don’t create risk due to drift, bias, data quality problems, and other issues, but rather deliver the insights and trusted intelligence to transform and accelerate the business. 

Prevent risks and get the value with comprehensive, scalable ML model monitoring. Want to learn more? Contact us.  

MLOps Insights – Collaboration

Improve collaboration around Machine Learning Ops and get more models into production.

Author: Dr. Navin Budhiraja, CTO & Head of the Vian Platform, Vianai Systems

 

The idea that collaboration can be a challenge is not a new concept, and especially not new to those implementing or leveraging new technologies in large organizations. There has always been a challenge of collaboration across stakeholders, however this becomes significantly more complex when dealing with AI-based systems. As AI initiatives gain momentum within organizations, the challenge of operationalizing and managing these models at scale becomes overwhelming for most enterprises. Whereas before we have seen that data and analytics initiatives were already challenged by organizational silos, difficult-to-use tools, cost-performance tradeoffs and risk, we now have intelligent, learning, changing models that need to be optimized, continuously evaluated for risks, managed and monitored for drift, and when needed, retrained and redeployed. Between data scientists, machine learning (ML) engineers, ML operational engineers, information technology (IT) teams, risk and governance teams, and even the end users, there is one thing that technology simply cannot entirely replace but can certainly enhance: collaboration between human contributors.

Instead of trying to force organizations to solve the complexity, we taken a human-centered approach, which is to abstract away the complexity for the various stakeholders across the ML model lifecycle.

Vianai is out to alleviate the complex, time-consuming, and costly pains that arise in such a large mix of stakeholders by deploying unique tools via our Vian H+AI MLOps Platform to facilitate optimized collaboration among teams in the ML model development and operation (MLOps) process.

While there is notable momentum around AI initiatives, and most organizations have created some data science practices to build models, there are massive challenges to ML model operations – moving models from validation and into production – yet to be addressed. AI experts at the 2019 Transform Conference discovered that more than 80% of ML models built won’t make it into production. And our experts have observed that of those officially produced, less than 20% affect the business outcomes as desired. We know there are major cost and infrastructure issues that contribute to these poor stats, as well as the fact that modern enterprises can be overwhelmingly complex. But arguably more important is the inconvenient truth that ineffective collaboration can level a project before it truly begins. Instead of trying to force organizations to solve the complexity, we taken a human-centered approach, which is to abstract away the complexity for the various stakeholders across the ML model lifecycle.

So, collaboration is key. Now what?

So, collaboration is key. Now what? Based on our work with customers, we see several areas where the roadblocks in collaboration can be solved:

1. We may know how to talk to each other, but do we know how to communicate? Technical language fluency varies per individual and can be considered the first obstacle of effective collaboration. Multiple stakeholders need to be on the same page for a model to be successful: executives, the technical team-building the model (internal, contracted or a mix of both), governance/regulatory oversight professionals and the IT team operating the model. However, with each team speaking a language native to their area of expertise, key information can get lost in technical translation. The address this, we created a platform with a simple, unified, and intuitive interface that seamlessly integrates with market leading tools and open-source products used across the ML model development and operationalization process. This means you can continue using the software tools you already have, and each stakeholder is provided access to the applicable information specifically tailored to their needs.

2. Models are different from traditional software. They are highly dependent upon data that is constantly changing, and once deployed and exposed to real-world data, models themselves also change. The bigger models get, the more accurate they get –at a significant cost. Today, models are rapidly growing in both size and complexity, requiring GPUs or NPUs that are cost prohibitive for all but a handful of the biggest companies in the world. With the Vian MLOps Platform, companies no longer need to choose between cost and performance. Our Performance Optimization enables companies to run models on commodity hardware with performance gains of 100x-10000x.  Even after a model is deployed, the Vian Platform can proactively monitor the model for potential optimizations, including speed and cost, thus easing the collaboration between the model creators and IT. This cost efficiency is even more relevant as companies deploy models on a variety of edge devices, many of which run legacy hardware.

3. Lastly, the challenge of model and data drift as ML model effectiveness can change over time. You aren’t going to get anywhere fast unless your team, and software, are built with agility to account for drift. We have observed that data scientists spend about 80% of their time figuring out how to get the data they need to train their models. Even with all that time devoted to proper data and metrics, there will always be variables left undiscovered until post-production. As an example, something as unexpected as an international pandemic can mean changed consumer behavior for a model that is forecasting supply and demand, which will require algorithmic retool. At Vianai, we have a comprehensive approach to mitigating risk, including a proprietary solution to evaluate uncertainty in your data and guide you to collect more data or identify additional signals. If your model doesn’t seem reliable, we will try to pinpoint exactly what data is needed to make it more reliable. In the long run, this will save you time, and ultimately, money. Instead of focusing your internal collaboration on what your roadblocks might be, you can quickly get to solving them. Our goal is to create tools to get your models into production faster. When tools become available, agile processes push things into production more quickly and efficiently by removing overhead, confusing conversations, and that old waterfall process.

We believe technology doesn’t remove people from the equation, but it can improve collaboration among them. There may still be a shortage of data scientists, but our tools help alleviate the pain of the talent gap while bringing teams onto the same page. Ultimately our goal is to get your models into production in the most cost-effective, time-saving, and secure way possible.

If you’d like to learn more about how we can help do that for you, we would love to connect.