高级检索

海洋环境监测数据异常及缺失处理方法研究进展

Research progress on methods for handling abnormal and missing data in marine environmental monitoring

  • 摘要: 海洋环境监测是指通过各类海洋环境监测仪器对海水中的叶绿素a、海水表面温度、溶解氧、营养盐(总磷、总氮等)等参数实现长期连续监测,对探究海洋规律、保护生态环境具有重要作用。然而,受到复杂的海洋环境及各类监测仪器的局限性等影响,原始监测数据,尤其是实时原位采集到的连续监测数据往往存在数据异常、数据缺失等严重影响数据质量的问题。为获取高质量的海洋环境监测数据,对原始监测数据进行预处理是一种必要的手段。本文从产生数据质量问题的原因出发,对数据质量的两个典型问题(数据异常、数据缺失)展开分析,总结了近年来海洋环境监测数据异常及缺失问题的处理方法研究进展,包括数学模型分析方法、统计学分析方法、频域方法和时域方法、机器学习方法等。通过比较各种数据预处理方法的原理、适用性、优势、缺陷等方面,对集成模型的进一步研究、检测时效性的提升、模型对数据的自适应性及现场处理能力等问题提出了建议性的展望。总体来说,传统的预处理方法大多依靠人工主观建模或总结经验规律,实现对异常数据的识别和缺失数据的填补,对于平稳性、规律性较好的数据效果明显,但对数据中的突变、连续异常等情况处理能力较差;新兴的预处理方法通过引入分解方法、机器学习、深度学习等智能算法,不仅能够掌握数据的长期规律特征,而且通过更深层次的学习对小尺度的变化特征充分挖掘,能够识别数据长期连续异常,并通过数据预测准确识别突变数据,完成对数据缺失部分的填补。然而,智能算法的引入仍然存在许多限制,比如训练数据不足、最优参数获取较为困难、对硬件算力的高需求等。海洋环境监测数据预处理方法将随着芯片科技、智能算法等前沿科技的突破,朝着高精度、低模型复杂度、高终端部署能力等方面发展。

     

    Abstract: Marine environmental monitoring refers to the long-term continuous monitoring of chlorophyll a, seawater surface temperature, dissolved oxygen, nutrient salts, such as total phosphorus, total nitrogen, and other parameters in seawater using various marine environmental monitoring instruments, which play an important role in exploring the laws of the ocean and protecting the ecological environment. However, affected by complex marine environment and limitations of various monitoring instruments, the raw monitoring data, especially continuous real-time monitoring data collected in situ, often have problems that seriously affect the data quality, such as abnormal and missing data. To obtain high-quality data during marine environmental monitoring, pre-processing of raw monitoring data is necessary. Starting from the causes of data quality problems, this paper analyses two typical issues of data quality (data anomalies and data missing), and summarizes the research progress of the methods for processing the anomaly and missing marine environmental monitoring data in recent years, including the mathematical model analysis methods, the statistical analysis methods, the frequency-domain methods and the time-domain methods, and the machine-learning methods. By comparing the principles, applicability, advantages, and disadvantages of these methods, a suggested prospect for further research on integrated models, improvement of detection timeliness, model adaptability to data, and on-site processing capability is presented. In general, the traditional preprocessing methods mainly rely on artificial subjective modelling or summarizing the empirical rules to identify the abnormal data and fill in the missing data, which has obvious effect on the data with better smoothness and regularity, but has poorer ability to deal with the mutation and continuous anomaly in the data. By introducing intelligent algorithms such as decomposition methods, machine learning, and deep learning, the emerging preprocessing methods can not only grasp the long-term regular characteristics of the data, but also fully mine the small-scale change characteristics through deeper learning, identify long-term continuous anomalies in the data and accurately identify mutant data through data prediction, so as to complete the filling of the missing parts of the data. However, there are still many limitations in the introduction of intelligent algorithms, such as insufficient training data, difficulty in obtaining optimal parameters, and high demand for hardware computing power. With the breakthrough of frontier technology such as chip technology and intelligent algorithms, the preprocessing method of marine environmental monitoring data will develop towards high precision, low model complexity, and high terminal deployment capabilities.

     

/

返回文章
返回