San Jose, CA (Strata + Hadoop World, Booth #1221) – March 6, 2018 – Hitachi Vantara, a wholly owned subsidiary of Hitachi, Ltd. (TSE:6501), today announced additional capabilities for machine learning orchestration to help data scientists monitor, test, retrain and redeploy supervised models in production. An innovation fromHitachi Vantara Labs, collectively known as “machine learning model management,” can use these new tools in a data pipeline built in Pentaho to help improve business outcomes and reduce risk by making it easier to update models in response to continual change. Improved transparency gives people inside organizations better insights and confidence in their algorithms. Hitachi Vantara Labs is making machine learning model management available as a plug-in through the Pentaho Marketplace.
As organizations transform digitally, their algorithms become a key competitive advantage – and potentially a risk. Once a model is in production, it must be monitored, tested and retrained continually in response to changing conditions, then redeployed. Today this work involves considerable manual effort and, consequently, is often done infrequently. When this happens, prediction accuracy will deteriorate and impact the profitability of data-driven businesses.
David Menninger, SVP & Research Director, Ventana Research, said, “According to our research, two-thirds of organizations do not have an automated process to seamlessly update their predictive analytics models. As a result, less than one-quarter of machine learning models are updated daily, approximately one-third are updated weekly and just over half are updated monthly. Out-of-date models can create significant risk to organizations.”
New data science model management improves the process of machine learning deployments in three areas:
Get models into production faster: New machine learning orchestration steps support data and feature engineering. These steps evaluate models and improve their accuracy using real production data before going live. For further model tuning and to avoid overfitting, data operations teams can generalize models against production test data using a choice of cross-validation and holdout evaluation techniques. Algorithm-specific data preparation and cleaning tasks – also referred to as “last mile data prep” – are now automated. Operations teams can adjust model parameters using a simple GUI instead of writing and maintaining code, which frees data scientists to develop new models.
Maximize model accuracy, while in production: Once a model is in production, its accuracy typically degrades as new production data runs through it. To avoid this, a new range of evaluation statistics helps to identify degraded models. Rich visualizations and reports make it easier to analyze model performance and uncover errors. When updates or changes occur, new “challenger” models can be easily A/B-tested against the current “champion” models. Since test results are returned faster the model can be adjusted sooner.
Collaborate and govern model operations at scale: More organizations are demanding visibility into how algorithms make decisions. Lack of transparency often leads to poor collaboration in groups deploying and maintaining models including operations teams, data scientists, data engineers, developers and application architects. These new capabilities from Hitachi Vantara promote collaboration, providing data lineage of model steps and visibility of data sources and features that feed the model. This greater transparency allows data and data pipelines to be easily shared, standardized and reused across teams allowing new machine learning applications to be built faster. Benefiting from an enterprise-grade platform, the machine learning model steps are embedded into data pipelines and can run large data volumes in a highly available and secure environment.
“Machine learning and artificial intelligence (AI) are optimizing everything from customer interactions to enterprise operations. As these applications evolve, data scientist and IT operation teams will need to move newly trained models into production faster than ever before, which can jeopardize model accuracy, collaboration and governance,” said John Magee, VP, product marketing, Hitachi Vantara. “Hitachi Vantara Labs’ machine learning model management provides improved algorithmic transparency and automation so application teams can focus their efforts on innovating rapidly without risking model deterioration.”
Product Availability and Resources
- Model management capabilities can be accessed in the Pentaho Marketplace beginning March 6, 2018. These plug-ins are currently unsupported and will be available for testing. In future versions, they may be integrated into Pentaho Data Integration (PDI). To learn more, visit Pentaho Labs.
About Hitachi Vantara