MLOps Insights - Collaboration

Improve collaboration around Machine Learning Ops and get more models into production.

Author: Dr. Navin Budhiraja, CTO & Head of the Vian Platform, Vianai Systems

The idea that collaboration can be a challenge is not a new concept, and especially not new to those implementing or leveraging new technologies in large organizations. There has always been a challenge of collaboration across stakeholders, however this becomes significantly more complex when dealing with AI-based systems. As AI initiatives gain momentum within organizations, the challenge of operationalizing and managing these models at scale becomes overwhelming for most enterprises. Whereas before we have seen that data and analytics initiatives were already challenged by organizational silos, difficult-to-use tools, cost-performance tradeoffs and risk, we now have intelligent, learning, changing models that need to be optimized, continuously evaluated for risks, managed and monitored for drift, and when needed, retrained and redeployed. Between data scientists, machine learning (ML) engineers, ML operational engineers, information technology (IT) teams, risk and governance teams, and even the end users, there is one thing that technology simply cannot entirely replace but can certainly enhance: collaboration between human contributors.

Instead of trying to force organizations to solve the complexity, we taken a human-centered approach, which is to abstract away the complexity for the various stakeholders across the ML model lifecycle.

Vianai is out to alleviate the complex, time-consuming, and costly pains that arise in such a large mix of stakeholders by deploying unique tools via our Vian H+AI MLOps Platform to facilitate optimized collaboration among teams in the ML model development and operation (MLOps) process.

While there is notable momentum around AI initiatives, and most organizations have created some data science practices to build models, there are massive challenges to ML model operations – moving models from validation and into production – yet to be addressed. AI experts at the 2019 Transform Conference discovered that more than 80% of ML models built won’t make it into production. And our experts have observed that of those officially produced, less than 20% affect the business outcomes as desired. We know there are major cost and infrastructure issues that contribute to these poor stats, as well as the fact that modern enterprises can be overwhelmingly complex. But arguably more important is the inconvenient truth that ineffective collaboration can level a project before it truly begins. Instead of trying to force organizations to solve the complexity, we taken a human-centered approach, which is to abstract away the complexity for the various stakeholders across the ML model lifecycle.

“So, collaboration is key. Now what?“

So, collaboration is key. Now what? Based on our work with customers, we see several areas where the roadblocks in collaboration can be solved:

1. We may know how to talk to each other, but do we know how to communicate? Technical language fluency varies per individual and can be considered the first obstacle of effective collaboration. Multiple stakeholders need to be on the same page for a model to be successful: executives, the technical team-building the model (internal, contracted or a mix of both), governance/regulatory oversight professionals and the IT team operating the model. However, with each team speaking a language native to their area of expertise, key information can get lost in technical translation. The address this, we created a platform with a simple, unified, and intuitive interface that seamlessly integrates with market leading tools and open-source products used across the ML model development and operationalization process. This means you can continue using the software tools you already have, and each stakeholder is provided access to the applicable information specifically tailored to their needs.

2. Models are different from traditional software. They are highly dependent upon data that is constantly changing, and once deployed and exposed to real-world data, models themselves also change. The bigger models get, the more accurate they get –at a significant cost. Today, models are rapidly growing in both size and complexity, requiring GPUs or NPUs that are cost prohibitive for all but a handful of the biggest companies in the world. With the Vian MLOps Platform, companies no longer need to choose between cost and performance. Our Performance Optimization enables companies to run models on commodity hardware with performance gains of 100x-10000x. Even after a model is deployed, the Vian Platform can proactively monitor the model for potential optimizations, including speed and cost, thus easing the collaboration between the model creators and IT. This cost efficiency is even more relevant as companies deploy models on a variety of edge devices, many of which run legacy hardware.

3. Lastly, the challenge of model and data drift as ML model effectiveness can change over time. You aren’t going to get anywhere fast unless your team, and software, are built with agility to account for drift. We have observed that data scientists spend about 80% of their time figuring out how to get the data they need to train their models. Even with all that time devoted to proper data and metrics, there will always be variables left undiscovered until post-production. As an example, something as unexpected as an international pandemic can mean changed consumer behavior for a model that is forecasting supply and demand, which will require algorithmic retool. At Vianai, we have a comprehensive approach to mitigating risk, including a proprietary solution to evaluate uncertainty in your data and guide you to collect more data or identify additional signals. If your model doesn’t seem reliable, we will try to pinpoint exactly what data is needed to make it more reliable. In the long run, this will save you time, and ultimately, money. Instead of focusing your internal collaboration on what your roadblocks might be, you can quickly get to solving them. Our goal is to create tools to get your models into production faster. When tools become available, agile processes push things into production more quickly and efficiently by removing overhead, confusing conversations, and that old waterfall process.

We believe technology doesn’t remove people from the equation, but it can improve collaboration among them. There may still be a shortage of data scientists, but our tools help alleviate the pain of the talent gap while bringing teams onto the same page. Ultimately our goal is to get your models into production in the most cost-effective, time-saving, and secure way possible.

If you’d like to learn more about how we can help do that for you, we would love to connect.

MLOps Insights – Collaboration

Improve collaboration around Machine Learning Ops and get more models into production.