Research on precipitable water vapor prediction method based on lightGBM algorithm
-
摘要: 大气可降水量(precipitable water vapor,PWV)为单位横截面积垂直气柱内地面至对流层顶部的液态水汽含量,可反映大气中的水汽浓度. 本文首先利用2014—2019年长三角地区7个探空站资料,分析了PWV与对流层天顶总延迟(zenith tropospheric delay,ZTD)、天顶静力学延迟(zenith hydrostatic delay,ZHD)、对流层湿延迟(zenith wet delay, ZWD)、水汽压(Es)、大气压(Ps)、地面温度(Ts)、加权平均温度(Tm)之间的相关性,再基于梯度提升机(light gradient boosting machine,LightGBM)构建了一套适用于长三角地区的PWV预测模型,并分析了LightGBM-PWV模型的预测精度. 结果表明,PWV与Tm、Ts、Ps、Es、ZHD、ZWD和ZTD之间的相关系数(R)分别为0.74、0.76、–0.59、0.76、–0.43、1.00和0.94;全年、分季度和分月LightGBM-PWV模型的平均偏差分别为0.10 mm、0.11 mm和0.12 mm,均方根误差(root mean square error,RMSE)分别为0.25 mm、0.26 mm和0.31 mm,模型精度依次递减,异于传统线性拟合PWV模型;全年LightGBM-PWV预测模型精度最高,可用于长三角地区的GNSS-PWV预测、分析和研究.
-
关键词:
- LightGBM /
- 大气可降水量(PWV) /
- 长三角地区
Abstract: The precipitable water vapor (PWV) represents the content of liquid water vapor in a unit cross-sectional area vertically from the Earth's surface to the top of the troposphere, reflecting the concentration of water vapor in the atmosphere. In this study, data from seven radiosondes in the Yangtze River Delta region from 2014 to 2019 were utilized to analyze the correlations between PWV and zenith tropospheric delay (ZTD), zenith hydrostatic delay (ZHD), zenith wet delay (ZWD), water vapor pressure (Es), atmospheric pressure (Ps), surface temperature (Ts), and weighted mean temperature (Tm). A new Light Gradient Boosting Machine (LightGBM)-based PWV prediction model for the Yangtze River Delta region was established, and then the prediction accuracy of the LightGBM-PWV model was analyzed. The results show that the correlation coefficients (R) between PWV and Tm, Ts, Ps, Es, ZHD, ZWD, and ZTD were 0.74, 0.76, –0.59, 0.76, –0.43, 1.00, and 0.94 respectively. The average biases of the yearly, seasonal, and monthly LightGBM-PWV model were 0.10 mm, 0.11 mm, and 0.12 mm respectively, and their RMSE are 0.25 mm, 0.26 mm, and 0.31 mm. The accuracy of the yearly, seasonal, and monthly LightGBM-PWV model decreased sequentially, different from the traditional linear fitting PWV models. The yearly LightGBM-PWV forecasting models demonstrate the highest accuracy. It can be applied for the GNSS-PWV forecasting, analysis, and research in the Yangtze River Delta region.-
Key words:
- LightGBM /
- precipitable water vapor /
- Yangtze River delta region
-
表 1 长三角地区探空站位置信息
地区 站点号 坐标(纬度,经度) 高程/m 杭州 58457 (30.23°N,120.16°E) 43.00 衢州 58633 (28.96°N,118.86°E) 71.00 上海 58362 (31.40°N,121.46°E) 4.00 安庆 58424 (30.53°N,117.05°E) 20.00 阜阳 58203 (32.86°N,115.73°E) 33.00 南京 58238 (32.00°N,118.80°E) 7.00 射阳 58150 (33.76°N,120.25°E) 7.00 表 2 PWV与Tm、 Ts 、Ps、Es、ZHD、ZWD和ZTD之间的相关系数
因变量 自变量 R 自变量 自变量 R 自变量 自变量 R 自变量 自变量 R PWV Ts 0.76 Ts Ps –0.60 Ps Tm –0.65 Es ZTD 0.66 PWV Ps –0.59 Ts Es 1.00 Ps ZHD 0.74 Tm ZHD –0.49 PWV Es 0.76 Ts Tm 0.94 Ps ZWD –0.58 Tm ZWD 0.73 PWV Tm 0.74 Ts ZHD –0.46 Ps ZTD –0.37 Tm ZTD 0.62 PWV ZHD –0.43 Ts ZWD 0.75 Es Tm 0.93 ZHD ZWD –0.40 PWV ZTD 0.94 Ts ZTD 0.66 Es ZHD –0.43 ZHD ZTD –0.09 PWV ZWD 1.00 Ps Es –0.58 Es ZWD 0.75 ZWD ZTD 0.95 表 3 PWV、Tm、 Ts 、Ps、Es、ZHD、ZWD和ZTD之间的容忍度
自变量 自变量 Tol 自变量 自变量 Tol 自变量 自变量 Tol Ts Tm 0.68 Tm Es 0.68 Ps ZTD 0.69 Ts Ps 0.79 Tm ZWD 0.45 Es ZWD 0.42 Ts Es 0.67 Tm ZHD 0.90 Es ZHD 0.89 Ts ZWD 0.42 Tm ZTD 0.52 Es ZTD 0.49 Ts ZHD 0.89 Ps Es 0.79 ZTD ZWD 0.12 Ts ZTD 0.49 Ps ZWD 0.65 ZTD ZHD 0.84 Tm Ps 0.68 Ps ZHD 0.94 ZWD ZHD 0.82 表 4 2014—2018年验证集的交叉验证精度对比表
mm 探空站 K值 RMSE Bias 南京 5 0.19 0.10 10 0.19 0.09 15 0.18 0.09 杭州 5 0.20 0.10 10 0.18 0.09 15 0.18 0.09 安庆 5 0.27 0.11 10 0.26 0.10 15 0.26 0.10 阜阳 5 0.39 0.12 10 0.39 0.12 15 0.39 0.12 衢州 5 0.29 0.12 10 0.29 0.11 15 0.29 0.11 上海 5 0.24 0.09 10 0.24 0.09 15 0.24 0.09 射阳 5 0.25 0.10 10 0.24 0.10 15 0.24 0.10 5 0.26 0.11 均值 10 0.26 0.10 15 0.25 0.10 表 5 最大深度、叶子节点数、迭代次数和学习率的最优参数可选值
参数 最大深度 叶子节点数 迭代次数 学习率 可选值 10、30、50、70、90、110 5、10、15、30、45、60、75 700、800、900、1 000 0.03、0.05、0.10 表 6 2019年LightGBM-PWV模型精度统计表
mm 探空站 安庆 阜阳 杭州 南京 衢州 上海 射阳 均值 Bias 0.10 0.12 0.09 0.09 0.11 0.09 0.10 0.10 RMSE 0.26 0.39 0.18 0.18 0.29 0.24 0.24 0.25 -
[1] ZHAO Q, LIU Y, YAO W, et al. Hourly rainfall forecast model using supervised learning algorithm[J]. IEEE transactions on geoscience and remote sensing, 2021, 60: 1-9. [2] 陈祥明, 王宝宝, 李若瑜, 等. 不同加权平均温度模型对大气可降水量影响分析[J]. 全球定位系统, 2023, 48(3): 72-76. DOI: 10.12265/j.gnss.2023016 [3] BALDYSZ Z, NYKIEL G. Improved empirical coefficients for estimating water vapor weighted mean temperature over europe for GNSS applications[J]. Remote sensing, 2019, 11(17): 1995. DOI: 10.3390/rs11171995 [4] 刘晨, 郑南山, 张玉振. 多因子加权平均温度模型研究[J]. 中国科技论文, 2018, 13(15): 1743-1748. DOI: 10.3969/j.issn.2095-2783.2018.15.009 [5] BEVIS M, BUSINGER S, HERRING T A, et al. GPS meteorology: Remote sensing of atmospheric water vapor using the global positioning system[J]. Journal of geophysical research: atmospheres, 1992, 97(D14): 15787-801. DOI: 10.1029/92JD01517 [6] 杨鹏飞, 赵庆志, 苏静,等. 黄土高原地区 PWV 影响因素分析及精度评定[J]. 武汉大学学报 (信息科学版), 2022, 47(9): 1470-1478. [7] 韦云, 王迅, 王浩, 等. 中国东南沿海地区PWV直接转换模型研究[J]. 大地测量与地球动力学, 2022, 42(7): 750-754. [8] 易正晖, 王帅民, 王勇, 等. GNSS对流层延迟推算可降水量的季节转换模型研究[J]. 大地测量与地球动力学, 2017, 37(8): 830-834. [9] 李黎, 宋越, 易金花, 等. 对流层延迟与可降水量直接转换模型研究[J]. 大地测量与地球动力学, 2019, 39(5): 492-495, 501. [10] UMAKANTH N, SATYANARAYANA G C, SIMON B, et al. Long-term analysis of thunderstorm-related parameters over Visakhapatnam and Machilipatnam, India[J]. Acta geophysica, 2020(68): 921-32. DOI: 10.1007/s11600-020-00431-2 [11] CAI M, LI J, LIU L, et al. Weighted mean temperature hybrid models in China based on artificial neural network methods[J]. Remote sensing, 2022, 14(15): 3762. DOI: 10.3390/rs14153762 [12] JU Y, SUN G, CHEN Q, et al. A model combining convolutional neural network and LightGBM algorithm for ultra-short-term wind power forecasting[J]. IEEE access, 2019(7): 28309-28318. DOI: 10.1109/ACCESS.2019.2901920 [13] LI K, LI L, HU A, et al. Research on modeling weighted average temperature based on the machine learning algorithms[J]. Atmosphere, 2023, 14(8): 1251. DOI: 10.3390/atmos14081251 [14] SABER M, BOULMAIZ T, GUERMOUI M, et al. Examining lightGBM and CatBoost models for wadi flash flood susceptibility prediction[J]. Geocarto international, 2022, 37(25): 7462-7487. DOI: 10.1080/10106049.2021.1974959 [15] 池钦, 赵兴旺, 陈健. 几种典型机器学习算法在短临降雨预报分析研究[J]. 全球定位系统, 2022, 47(4): 122-128. DOI: 10.12265/j.gnss.2022039 [16] MARWITZ J D, DAWSON P J. Department of atmospheric science university of wyoming[C]//The International Symposium on the Qinghai-Xizang Plateau and Mountain Meteorology, F, 2015. [17] YANG L, SHAMI A. A lightweight concept drift detection and adaptation framework for IoT data streams[J]. IEEE internet of things magazine, 2021, 4(2): 96-101. DOI: 10.1109/IOTM.0001.2100012 [18] JIANG P, YE S, LU Y, et al. Development of time-varying global gridded T s–T m model for precise GPS–PWV retrieval[J]. Atmospheric measurement techniques, 2019, 12(2): 1233-1249. DOI: 10.5194/amt-12-1233-2019 [19] YAO Y, ZHANG B, XU C, et al. Improved one/multi-parameter models that consider seasonal and geographic variations for estimating weighted mean temperature in ground-based GPS meteorology[J]. Journal of geodesy, 2014, 88: 273-82. DOI: 10.1007/s00190-013-0684-6 [20] YU Z, QU Y, WANG Y, et al. Application of machine-learning-based fusion model in visibility forecast: a case study of Shanghai, China[J]. Remote sensing, 2021, 13(11): 2096. DOI: 10.3390/rs13112096 [21] CHEN S. K-nearest neighbor algorithm optimization in text categorization[C]//The IOP Conference Series: Earth and EnvironMental Science, F, 2018. [22] YONG Z, YOUWEN L, SHIXIONG X. An improved KNN text classification algorithm based on clustering[J]. Journal of computers, 2009, 4(3): 230-237. [23] LIU H, XIAO Q, JIAO Z, et al. LightGBM-based prediction of remaining useful life for electric vehicle battery under driving conditions [C]//The 2020 IEEE Sustainable Power and Energy Conference (iSPEC), 2020. [24] 范頔, 李黎, 刘彦, 等. 长三角地区GNSS可降水量直接转换模型研究[J]. 大地测量与地球动力学, 2021, 41(6): 628-632. [25] 刘彦, 李黎, 韦晔, 等. 利用CORS站研究多因子分季节可降水量转换模型[J]. 测绘科学, 2021, 46(7): 31-37.