Machine learning: believe it or not?

Combining the right data science with scalable IT solutions brings benefits to traditional marine engineering.

Jaroslaw Nowak
Equipment Analytics Global Product Specialists, ABB Marine & Ports

Machine learning techniques have advanced considerably in recent years. The software that allows these novel methods to be applied to industrial datasets is now widely available. The wider availability of historical data is equally as significant, strengthened by the presence of a digital infrastructure allowing information to be collected from operating vessels and stored in the cloud at relatively low cost.

In this arena of opportunity, digital leaders, technology managers, software developers and researchers working in marine industry often struggle to make the right choices in answer to such questions as:

Is the cloud technology offered by an IT company going to perform according to my technical and budgetary expectations?
Will machine learning methods used in medical research really help me solve my engineering problems?
Can I trust my domain knowledge and my intuition, or should I follow up the advice and prognostics generated by a ‘black box’ algorithm?

ABB is a proven provider of state-of-the-art engineering solutions for the marine market. As such we deal with these questions on a daily basis. And as a company dedicated to incorporating modern technology trends into our domain, we are not afraid to think outside the box. Testing machine learning methods used in biostatistics to detect faults in our equipment is one example. Still, we want to proceed with a maximum understanding of the science behind the solution and guide our progress by using our knowledge as domain experts. In an effort to learn more about machine learning, ABB Marine & Ports Digital Service R&D team members have been educating themselves in data science methodologies, applying these to a few select technical cases, and deploying them with the support and use of different IT cloud platforms, in cooperation with internal and external IT teams.

The journey has begun, and it is already showing promise. Machine learning has become a strategic element in ABB Marine & Ports Digital Services, and we would like to share the lessons learned so far, in an attempt to stimulate discussions with our customers on this topic.

The technical cases presented here relate to data-driven modelling for the purpose of diagnostics and fault detection in marine equipment. In the specific context of problems presented, we analyze the modeling techniques to be tested. Discussion is then followed up with practical study of how developed models and scoring algorithms can be distributed and shared across multiple IT ecosystems to allow the data processing pipeline to be adjusted to fit the needs of each individual project and customer request.

What is data science?

The answer to this question probably depends on who you ask. A quick online search might reveal the humorous attempt at an explanation shown in Figure 1. On a more serious note¹, if one asks statisticians or computer scientists about the difference between statistics and machine learning, stronger and more polarized opinions may be offered:

“Machine learning is essentially a form of applied statistics”
“Machine learning is glorified statistics"
“Machine learning is statistics scaled up to big data"
“The short answer is that there is no difference"

Or even more provocative:

“Machine learning is for Computer Science majors who couldn't pass a Statistics course"
“Machine learning is Statistics minus any checking of models and assumptions"
“I don't know what Machine Learning will look like in ten years, but whatever it is I'm sure statisticians will be whining that they did it earlier and better"

Again, following the article listed under References: “The difference is about different goals and strategies.” It seems statisticians, when developing their models, are mostly interested in delivering a precise mathematical and statistical framework. Predictions from the model are not of primary importance, rather “the analysis is a final product.”

At the same time for machine learning practitioners, “the predominant task is predictive modelling. The proof of the model is a test set.” Due to fewer restrictions on proving the strict mathematical foundations of a model, machine learning users are free to choose from a larger set of models. In the case of so-called ‘black box’ techniques such as neural networks or even random forests and boosted decision trees, knowing the mathematical principles behind the algorithms does not necessarily make clear to which problems they should be applied. We may be very satisfied with the results of solving one problem, and yet disappointed and confused to see poor performance in a similar case with different data sets.

So where do industrial and marine engineers stand in this debate? Perhaps we are fortunate to represent disciplines based strongly on the laws of physics, thus offering us access to analytical models. We may have to simplify equations describing for instance power flow within the marine propulsion chain, but we still understand main principles and relations between most critical measurements. In that sense we are privileged compared to those who apply themselves to the study of biostatistics, medicine, or geology, to name a few. There, in the absence of understanding the complex mechanistics of processes, applying statistical or machine learning is the only way to indicate causality. The point is that engineers may enrich existing machine learning methods with prior knowledge about the expected mechanisms behind modelled data. The knowledge can be introduced with proper labelling of data and cases such as those described in the referenced article². Knowledge can also be applied using well-known and proven mathematical techniques originating from other disciplines such as process identification and theory of control.

Estimating pressure drop rate in medium voltage drive cooling systems

The first case to be discussed applies the absolute basics of machine learning. It is about forecasting the point in time when the pressure of internal cooling water in a medium voltage frequency converter (or drive) will hit its warning limit. The medium voltage frequency converter delivered by ABB is one of the critical components of an electric propulsion system. In that sense, developing analytics and predictions about the performance of the drive adds additional value for the customer. The frequency converter will function as expected if it is properly cooled. The closed water cooling system is characterized by a slow but constant drop in pressure of the coolant due to natural evaporation and normal leakage through pump seals. As a consequence, the cooling circuit must be regularly topped up with water. The maintenance work of adding the coolant must be performed while the frequency converter, and consequently the entire propulsion chain, is shut down. Therefore, this operation is typically planned well in advance, and knowing the expected timing of warning limits optimizes the planning process.

The machine learning model operating behind the scenes is the simplest, univariate linear regression of the pressure signal itself. It does not model the influence of temperature, nor it does it tell whether the pressure drop is as expected, or is accelerated and thus abnormal. Rather, within the dynamically adjusted time horizon, it captures the linear fit of the pressure drop and calculates the time when it will intercept the warning limit (see Figure 2). As many gurus of statistics would say: “If the simple model works for its purpose – do not try to complicate it.”

This simple model has been implemented as a result of cooperation between ABB and the startup company Dutch Analytics, in an Artificial Intelligence accelerator program run by ABB in 2019. Dutch Analytics has also deployed this model in their highly scalable cloud platform Xenia, and integrated it with the main data and presentation pipeline hosted by ABB. Details on various scenarios of IT deployment of machine learning models are discussed later in this article.

Analysis of medium voltage bus bar temperature

The second case is an example of time series data types. Here, the number of covariates used in modelling is much larger, consisting of approximately 1000 various signals per vessel. Recorded data represent the relative temperatures of the medium voltage bus bars detected by infrared sensors inside medium voltage switchboard cabinets. In addition to temperature, the current circuit breaker output from the cabinet is also measured.

The goal is to build a model that helps detect abnormal temperatures in the system. The challenge is that this is a purely unsupervised case where no failure has yet been observed in the system. The approach is to construct a model derived from a training data set using a new system with no defects. Next, we use this model to track the difference over time between the prediction from the model and the actual measurements (so called residuum). As a result, once the residuum processing part of the algorithm indicates abnormality, an early warning is sent to the human expert (in this case an ABB service engineer).

In this case, applying multivariate linear regression or even generalized additive models does not seem to provide immediate or positive results. This is mainly because time series data are are represented differently than typical data sets successfully modelled and predicted by major statistical inference methods. For instance, in disciplines such as biostatistics or genomics, each data sample is to be distinctly treated by the model’s learning process. By contrast, time series data originating from industry includes the dynamics of the process, e.g. with transients between different states, and those transients should also be included in the model (see Figure 3).

The solution is use of methods for dynamic system identification that are well known from process identification and process control studies. Here however, strong analytical knowledge of the process generating the data is required. Descriptions using state space models, Laplace transforms or direct differential equations is quite cumbersome, often even impossible, but with some simplification controlled by domain, a marine engineer can combine deterministic and stochastic descriptions into something resembling a Kalman filter framework.

We strongly believe that there is huge potential to outperform typical statistical and machine learning methods using this approach.

For this second case, the ABB Marine & Ports Digital Service R&D team and a research group of statistics and data scientists at the University of Oslo are conducting research to combine statistical methods with those specific to process industry, in order to model cases as described in previous sections of this paper. We believe this cooperation will produce applicable decision support systems that will help to optimize maintenance procedures for onboard marine equipment.

Model deployment tests in the cloud

Once the model is ready to consume and process newly measured data, enabling it to produce predictions (scoring results), the IT architecture is introduced and a decision reached on where the model is to be deployed. Again, there are many different scenarios, but in the interest of simplicity, we assume that the model can be running either:

At the edge, e.g. directly on site, where the data collection is taking place. As depicted in Figures 4 and 5, our model is to be part of an onboard data collection and remote diagnostic system (RDS)
And/or in the cloud, where we can leverage the latest IT technologies to handle big data, run scalable solutions and exchange data and models across different platforms owned by various stakeholders (system providers, ship operators, class societies)

Our focus in this case is on the cloud deployment architecture. As a test case, we selected time prediction of water top-up in the medium voltage drive’s cooling system. The team included data scientists from Dutch Analytics and ABB and architects from the ABB Ability^TM Analytics platform, and was moderated by the ABB Marine & Ports Digital Service R&D team. The goal was to understand technical capabilities, required effort and cost of IT deployment and operation needed to run the same model within different cloud architecture stacks. Technologies selected include Xenia, a product of Dutch Analytics, utilizing Google cloud solutions and the ABB Ability™ Analytics platform based on Microsoft Azure cloud technology.

The model itself is implemented in Python and consists of two modules. First, the Data Cleansing element organizes raw data sets into a common time grid and manages measurement gaps. Following that is a Model Prediction module that first estimates coefficients of the linear model with use of specific time horizon, and then predicts the date when the warning limit is expected to be reached.

The overall data and analytics pipeline is presented in Figure 4 and Figure 5, where the upper figure presents the solution deployed in the Xenia platform (case A) and the lower in the ABB Ability™ Analytics framework (case B).

Both have in common the way in which data are collected onboard and securely transferred in batch modes as compressed files into cold storage inside the ABB network. Next, data is extracted and transformed to plain text format that can either be pushed via a REST interface to XENIA blob storage, or within the same ABB ecosystem to Azure blob storage. The heart of data processing and analytics in scenarios A and B is deployed into two different cloud eco-systems, yet they share common technologies such as Spark Databricks, relational databases and web services. Eventually, scoring results will be presented in the fleet operating center dashboard.

Xenia from Dutch Analytics is built on a modern micro-service based architecture, making it efficient at scaling and distributing computation loads using Kubernetes as an auto-scaling framework. The goal of Xenia is to provide a powerful abstraction layer for robust execution of data science code, ensuring efficient resource management without requiring that the user be a DevOps expert.

ABB Ability™ is the company’s unified, cross-industry, digital offering – extending from device to edge to cloud – with devices, systems, solutions, services and a platform which enables customers increase productivity and lower costs. ABB Ability™ was launched in 2017 and already offers more than 210 solutions.

Using principally the same Python code that implements both Data Cleansing and Model Prediction modules, the multidisciplinary team from Dutch Analytics and ABB could functionally prototype the same pipeline in two different IT platforms. The tasks of making code and interface adaptation, meetings and discussions, and final deployment tests of the prototype took around 80 hours total for the entire team.

The result of the above exercise is more prototype than final product, but it gives an idea of how little effort is required, and in principle how easy it is to run similar proof-of-concept integration tests. Many lessons have been learned, covering aspects from pure data analytics to collaboration effectiveness, concluding with cloud IT technology specifics.

Figure 5: ABB Ability™ Analytics platform

Learnings

For Jaroslaw Nowak, a member of ABB Marine & Ports Digital Service R&D team, and whose role it is to develop early prototypes that may be turned into products, the learning can be summarized as follows:

With a team of experts in their domains focusing on delivering quick and tangible results, it is surprisingly easy to build various cloud data analytics pipelines.
A key factor for realizing the above statement is for implementation of each pipeline discussed in the text, even though they may be deployed within IT frameworks offered by different suppliers, to be based on the same or very similar technologies, such as Spark Databricks, Kubernetes, relational databases and web services.
From a business perspective, running such an exercise should be an initial, mandatory step in each analytics-in-the-cloud-integration type of project. This is when parties such as ABB, with their end-to-end solutions, customers with a desire to know more about their assets, and third-party data scientist or platform providers already part of an existing IT eco-system, are asked to collaborate and build an integrated solution.
Modelling and diagnostics, let alone applying known machine learning methods to sensor data as they come, may not provide answers to questions such as whether equipment is in a faulty state. It seems that organizing and properly labelling data, combining multiple sources of information (sensor data, maintenance log data), and numerous discussions with marine domain experts to define algorithm goals, are all activities fundamental to proper selection and synthesis of machine learning algorithms. Without these, the quality of predictions given by the algorithm itself may be very poor.

For the Dutch Analytics team, represented by CTO Victor Pereboom and Enrique Guiterrez Neri, a data engineer with a key role in the implementation of this exercise, “working together with the Marine & Ports Division of ABB was a great opportunity to validate the performance of our platform and get feedback on its functionality. Collaborating on both the analytics part as well as the deployment shows how well platforms like ABB Ability^TM and our own Xenia platform can help in shortening the time to market of the end solution, providing more room for investment in the actual algorithm development itself.”

Finally, for Felix Mutzl, Machine Learning Solution Engineer from ABB Ability™ Analytics team, “this exercise provided an opportunity to help put the analytics framework’s capabilities into practice in a real-world use case. On top of that we enjoyed the fruitful discussions and constructive collaboration with both our colleagues from ABB Marine & Ports as well as the Dutch Analytics team.”

Join us on this journey

In ABB we try to understand the processes that generates data before we apply proper analytical or machine learning methods. This in order to avoid building false models that generate misleading predictions. Statistics and machine learning methods are powerful by themselves, but enriched with domain-specific process identification knowledge, they may provide much stronger results and interpretability. If the data scientists analyzing data from a critical electric motor report very low correlation between the winding temperature signal and the motor current signal, feel free to contact the data engineers in ABB and join our journey. We know from engineering design principles that the current-temperature relation is non-linear, and that checking a simple correlation matrix may lead to incorrect model selection and could result in very poor predictions.

References

[1] Fawcett T, Hardin D, (2017) Machine Learning vs. Statistics, Silicon Valley Data Science.
Retrieved from https://www.svds.com/machine-learning-vs-statistics

[2] Stakkeland, Morten; Lee, Bo Won; Parrondo, Yago, (2019) Anticipating technical faults with machine learning. Retrieved from https://new.abb.com/marine/generations/anticipating-technical-faults-with-machine-learning