Evaluation of the performance of data-driven approaches for filling monthly precipitation gaps in a semi-arid climate conditions

Katipoğlu, OKAN

doi:10.1007/s11600-022-00963-9

Evaluation of the performance of data-driven approaches for filling monthly precipitation gaps in a semi-arid climate conditions

Katipoğlu O. M.

Acta Geophysica, cilt.71, sa.5, ss.2265-2285, 2023 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 71 Sayı: 5
Basım Tarihi: 2023
Doi Numarası: 10.1007/s11600-022-00963-9
Dergi Adı: Acta Geophysica
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Communication Abstracts, Compendex, Geobase, INSPEC, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.2265-2285
Anahtar Kelimeler: Artificial neural networks (ANN), Gaussian processes regression (GPR), Missing data, Precipitation, Support vector regression (SVR), Ensemble regression trees (RT), Linear regression (LR)
Erzincan Binali Yıldırım Üniversitesi Adresli: Evet

Özet

Missing data cause problems in meteorological, hydrological, and climate analysis. The observation data should be complete and cover long periods to make the research more accurate and reliable. Artificial intelligence techniques have attracted interest for completing incomplete meteorological data in recent years. In this study the abilities of machine learning models, artificial neural networks, the nonlinear autoregressive with exogenous input (NARX) model, support vector regression, Gaussian processes regression, boosted tree, bagged tree (BAT), and linear regression to fill in missing precipitation data were investigated. In developing the machine learning model, 70% of the dataset was used for training, 15% for testing, and 15% for validation. The Bayburt, Tercan, and Zara precipitation stations, which are closest to the Erzincan station and have the highest correlation coefficients, were used to fill the data gaps. The accuracy of the constructed models was tested using various statistical criteria, such as root-mean-square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe model efficiency coefficient (NSE), and determination coefficient (R2) and graphical approaches such as scattering, box plots, violin plots, and Taylor diagrams. Based on the comparison of model results, it was concluded that the BAT model with R2: 0.79 and NSE: 0.79 and error (RMSE: 11.42, and MAE: 7.93) was the most successful in the completion of missing monthly precipitation data. The contribution of this research is assist in the choice of the best and most accurate method for estimating precipitation data in semi-arid regions like Erzincan.