Abstract:
Marine environmental monitoring refers to the long-term continuous monitoring of chlorophyll a, seawater surface temperature, dissolved oxygen, nutrient salts, such as total phosphorus, total nitrogen, and other parameters in seawater using various marine environmental monitoring instruments, which play an important role in exploring the laws of the ocean and protecting the ecological environment. However, affected by complex marine environment and limitations of various monitoring instruments, the raw monitoring data, especially continuous real-time monitoring data collected
in situ, often have problems that seriously affect the data quality, such as abnormal and missing data. To obtain high-quality data during marine environmental monitoring, pre-processing of raw monitoring data is necessary. Starting from the causes of data quality problems, this paper analyses two typical issues of data quality (data anomalies and data missing), and summarizes the research progress of the methods for processing the anomaly and missing marine environmental monitoring data in recent years, including the mathematical model analysis methods, the statistical analysis methods, the frequency-domain methods and the time-domain methods, and the machine-learning methods. By comparing the principles, applicability, advantages, and disadvantages of these methods, a suggested prospect for further research on integrated models, improvement of detection timeliness, model adaptability to data, and on-site processing capability is presented. In general, the traditional preprocessing methods mainly rely on artificial subjective modelling or summarizing the empirical rules to identify the abnormal data and fill in the missing data, which has obvious effect on the data with better smoothness and regularity, but has poorer ability to deal with the mutation and continuous anomaly in the data. By introducing intelligent algorithms such as decomposition methods, machine learning, and deep learning, the emerging preprocessing methods can not only grasp the long-term regular characteristics of the data, but also fully mine the small-scale change characteristics through deeper learning, identify long-term continuous anomalies in the data and accurately identify mutant data through data prediction, so as to complete the filling of the missing parts of the data. However, there are still many limitations in the introduction of intelligent algorithms, such as insufficient training data, difficulty in obtaining optimal parameters, and high demand for hardware computing power. With the breakthrough of frontier technology such as chip technology and intelligent algorithms, the preprocessing method of marine environmental monitoring data will develop towards high precision, low model complexity, and high terminal deployment capabilities.