Anticipating technical faults with machine learning

Failures in critical onboard machinery can result in production loss, delays, or even worse, endanger the ship and its crew. For the past decade, ABB Ability™ Marine Remote Diagnostic Systems have been collecting data to provide technical support when needed as well as time based reports related to the status and operation onboard the vessel. With the volume of data collected, owners are able to augment onboard crews’ skills and increase the efficiency of fleetwide operations adding value on a daily basis.

^{Morten Stakkeland, Data Scientist, ABB Marine & Ports}

^{Bo Won Lee, Product Manager, Propulsion Control Unit, ABB Marine & Ports}

^{Yago Parrondo, Product Area Manager, Smart Asset Management, ABB Marine & Ports}

The Advance Analytics team in Smart Asset Management is able to support and help key Marine players to make the correct decisions based on data by analyzing the vast amount of information collected over the years and applying machine learning applications. Another key application from the results provided by the Advance Analytics team is that our ABB products and digital solutions can be improved and optimized to provide additional value by adjusting and enhancing current designs, for example thermal protection used on the Propulsion Control Unit application in ABB Ability™ System 800xA.

The recently developed prediction model has a representation of the motor cooling unit independent from the PT100 motor winding temperature measurements. All the required data for the model is collected by RDS from the frequency converter (actual power and actual torque signals) and the propulsion motor cooling system (cooling air temperature signal). The prediction model is coded in PCU controller application and can calculate the “predicted motor winding temperature”.

The predicted motor winding temperature is used so that average winding temperature is compared with the model temperature for all the range of motor temperature, and notify the alarm system or reduce the propulsion power if the condition persists.

This prediction model can be configured and extended to other heavy machinery applications, including propulsion transformer or diesel generator.

The application

The scope of the work was to develop a thermal protection function for high performance marine propulsion motors. A generic overview of the system can be found in Figure 1. This specific generic configuration is frequently used in marine propulsion systems where an onboard freshwater cooling system is used to remove heat produced by the propulsion motor. The motor is cooled by an air cooling loop, where air is circulated by one or more fans. Heat is transferred from the hot air to the fresh water cooling system in a heat exchanger.

The temperature of the air is measured on the inlet and outlet of the heat exchanger. The rotation speed of the motor is either measured directly or provided by the Propulsion Control Unit (PCU). The Propulsion Control Unit (PCU) is a controller application integrated with the propulsion frequency converter that controls the speed and power of the propulsion motors. The mechanical torque and electrical power are calculated and provided by the PCU. More instrumentation may be available on some vessels, but a minimum set was chosen for training.

Until now, all the propulsion winding temperature protections have been based on physically mounted PT100 temperature sensors on the motors. There is a fixed limit (typically at 155 degrees C) in the propulsion control software that enables only single point of critical temperature level protection, and this does not cover the entire range of temperature below the critical set point. Based on the mounting spot of the PT100 sensors, efficiency and accuracy of the protection system may vary - the highest temperature of windings may be on a different spot on the area.

The main target of the development work was to implement a thermal protection function that can detect an abnormal state prior to reaching the HH temperature limit, as past experience shows that the motor may already be damaged when this temperature is reached. Given the critical function of the class of high performance marine propulsion motors under consideration here, the ability to detect failures at an early stage adds value for the customers.

The people

The effort to develop the thermal protection function involved people from several sections within ABB Marine, but also cooperation with external academic partners through the Big Insight project, where ABB is a funding industrial partner. ABB contributes in the Sensors Systems work package, selecting research tasks, providing data, and actively contributing in the work. Morten Stakkeland is employed part-time in the department of Statistics at the University of Oslo as Associate Professor II, while Jaroslaw Nowak has started working on an industrial PhD in close cooperation with the project.

The bulk of development work within ABB has been performed by the analytics team in the Marine Digital Service department. The main function of the team is to extract knowledge from the existing pool of past data collected by RDS and other sources, and develop the digital functions that will be sold to our customers in the future. Jaroslaw Nowak had a key function in the early stage of the project, delivering problem definition, signal selection and establishing a data pipeline, while Morten Stakkeland has supported most of the cross-team academic work. In this project, researchers from the Norwegian Computing Center provided support with statistical analysis.

The implementation and adaptation of the developed algorithms to the PCU platform has been led and carried out by Bo-Won Lee from the Technology department.

The data

All data has been collected by the ABB Ability™ Marine Remote Diagnostic System (RDS). For all the propulsion motors in the dataset, the following measurements were collected:

Air temperature on both sides of the heat exchanger
Mechanical Torque
Speed
Power
Redundant temperature measurements on each winding

Two datasets were collected, from two separate classes of vessels and propulsion motors. The first dataset consisted of data from two vessels with two propulsion motors on each vessel, and the duration of the dataset was approximately one year. The second dataset consisted of data from five individual vessels, with durations varying from two months to two years. One of the datasets also included data from sea trials.

The RDS was configured to collect data every minute synchronously, which means that each signal was collected approximately once per minute independent of the signal state. Note that the sampling times of each signal do not necessarily correspond to the sampling times of the other signals. In addition, so-called asynchronous loggers were configured to sample data if a signal changed more than the configurable limit. These loggers hence sample data at irregular intervals, more frequently during periods with high dynamics. An illustration of how the synchronous and asynchronous samplers interact is given in the following figure. A sync sampler is here configured to sample every 100 s, while an async sampler stores a timestamped value if the signal changes more than d=20 in this case.

Figure 2: The ability to detect abnormal heat generation in the motor below the HH limit is facilitated through modeling the generic system

The Modeling

The ability to detect abnormal heat generation in the motor below the HH limit is facilitated through modeling the generic system in Figure 2. However, rather than using a classical engineering approach by modeling the cooling loop using a system of differential equations or the equivalent, a data driven approach was applied. Past data together with machine learning and statistical modeling was used to derive the relationship between system inputs and outputs (winding temperatures). The derived model is then used to detect deviations from the normal case – in this case overheating.

Using data driven modeling and machine learning offers several potential benefits. Firstly, a data driven model is not dependent on specific domain or application knowledge, like the ability to model the heat transfer within the propulsion motor itself. A model can thus be created without access to experts possessing this knowledge, internally or externally. Secondly, a data driven model has the potential to capture effects or correlations that are unknown to the experts creating a physics-based model. It is to some extent less dependent on prior knowledge, but on the other hand these effects need to be captured in the training data.

The training data is limited in the sense that data from failure cases is scarce at best. Using an out-of-the-box machine learning algorithm to train a classifier algorithm that can separate faults from non-faults is hence difficult or impossible. This is a general challenge in marine applications, as well-tagged failure data is rarely available. One reason is, of course, that some failures are rare by nature, but exchange of data between companies is also rare, and there is a widespread lack of correctly identified and labeled fault data. The chosen approach in this work was thus to model the system using regression analysis, and to use knowledge and physics-based modeling of fault effects.

As the final protection function is to be implemented in the PCU, the complexity of the model and the memory requirements need to be adapted to the real time requirements of the final application. The flow is shown in Figure 3.

Figure 3: As the final protection function is to be implemented in the PCU, the complexity of the model and the memory requirements need to be adapted to the real time requirements of the final application.

Exponentially Weighted Moving Average (EWMA) Models

Considering the physics of the considered system, the temperature of the windings cannot be modeled by the instantaneous inputs. The history of the system needs to be taken into account. Heating up a block of metal takes some time, even when running at full power.

EWMA models were used to take past system values into account. An EWMA model can be characterized by the following equation:

y_k here is the output of the EWMA model at time step k, y_(k-1) is the value of the EWMA at the previous time step, and x_k is the input variable. θ is a design variable, which determines the time constant of the system. The relationship between the factor θ, time constant τ , and sampling interval Δ is given by θ=Δ/(Δ+ τ). The EWMA model is in practice a first order low pass filter with time constant τ . The EWMA models have several benefits, the first being the recursive nature of the models, which requires little memory and is relatively simple to implement in an industrial real time system. Also, a system that can be characterized by a first order ordinary differential equation can be perfectly approximated by an EWMA model. The model is hence a decent approximation for many physical systems, including heat transfer models.

The training

The model was trained using regression analysis, fitting a model to the data. An overview of the workflow can be seen in Figure 4. A number of different models were tested, including different combinations of inputs and EWMA models with different time constants.

Figure 4: The model was trained using regression analysis, fitting a model to the data.

The performance of the monitors was evaluated using a measure of accuracy called the Root Mean Square Error (RMSE), which is given by the following equation.

Note that in the final application, we are not very concerned with the RMSE, but rather with the following parameters:

The false alarm rate
The probability of missed detection given a fault
The time to detection given a fault

The RMSE is still a useful measure of the accuracy of the model, even though the three parameters also depend on other statistical properties of the model residual.

Several similar models and parameter combinations were shown to provide similar accuracy in the sense of RMSE. However, the following model was selected based on the following criteria:

Minimize the number of variables
Use physical knowledge where available – know that the heat generation is a function of torque squared

The selected model is given by the following equation, where the inputs are described in the table below.

A lagged variable here means that the variable is used as input to a EWMA model.

Note that the structure of the model is relatively simple, and only the values of the EWMA models at the previous time steps are stored in memory. This means that the model and monitor can be implemented without requiring significant computational power or memory.

An example of comparison between the predicted and actual temperatures on a motor, with data recorded during sea trial, is shown in Figure 5.

Figure 5: Comparison between the predicted and actual temperatures on a motor, with data recorded during sea trial

The plot shows how the prediction is close to the winding temperatures, with an error smaller than 10 degrees Kelvin. The data and corresponding estimates are from a sea trial, from a motor that was not included in the training data.

In the verification phase, leave-ship-out and leave-motor-out where used to verify that the model was not overfitted to the training data. When running a leave-motor-out analysis, a single motor was left out of the training data when estimating the parameters of the model. The model was then tested on the left-out motor, to check the degree of fit. In addition to leave-ship and leave-motor-out analyses, a new dataset will be collected from a new vessel and used as a verification dataset.

The monitor

Based on the implemented model, a monitor was deployed using a simple threshold. If the actual minus exceed the threshold of predicted temperatures, then protection functions were initiated.

The monitor was implemented with one monitor and one set of parameters per motor class, without adapting a single model and monitor to each individual vessel or motor. With this approach, no additional training data is needed for new vessels. However, long time monitoring of new data is expected to be implemented in Azure.

Note that an integrating counter was implemented in order to deal with outliers of short duration in the data.

The monitor should be disabled during periods of zero power. If the fans in the air cooling loop are turned off, then the motor will take on ambient temperature, which in rare cases may trigger the monitor if active.

Result

The threshold of the monitor could be set such that no single false alarms were generated in the training data. Based on this threshold and a realistic failure model, the monitor can be shown to detect failures more than one hour before the critical trip limit is reached, and at temperatures significantly below the high temperature trip limit.

Current status

As mentioned in a previous section, a monitor has been trained for two classes of vessels. A monitor with an integrating detection function has been implemented in the PCU, and tested in an on-shore simulation.

The model and monitor has been implemented in the PCU on a vessel that is currently on its way to sea trial. The monitor is implemented in a test mode, in the sense that no safety functions are activated. It is going to be in test mode during a validation period, where the performance of the model is monitored in the actual setting, and a validation dataset is collected. During the validation period, the RDS is logging both the inputs and the output of the model and monitor at high temporal resolution.

Future work

On the research side, some effort is expected to be put in to developing individual models for each motor, and to investigate how prior data collected from other motors and vessels, can be used to train the model using a minimal amount of data. Also, further developing generic fault models based on faults from similar motors will also be investigated.

On the implementation side, focus will be on building automatically update digital twin models in Azure. The digital twin implementation allows for different metrics and analytics to be implemented on the dataset, as for instance long term monitoring of the model.

Also, some additional parameters should be included in the modeling where available, as for instance the cooling water temperature and the number of running cooling fans and their respective speeds. Adding instrumentation to new constructions to improve the model should also be considered.

In addition, the regression modelling will be extended and adapted to other systems.

Conclusions

Data collected by the RDS has been used to apply machine learning to develop a motor temperature monitor. The monitor can detect faults way before existing safety functions.

Machine learning and statistical modeling are powerful tools for developing equipment models; digital twin models that simulate the normal function of a local system or piece of equipment.