Acta Geophysica, cilt.71, sa.5, ss.2265-2285, 2023 (SCI-Expanded)
Missing data cause problems in meteorological, hydrological, and climate analysis. The observation data should be complete and cover long periods to make the research more accurate and reliable. Artificial intelligence techniques have attracted interest for completing incomplete meteorological data in recent years. In this study the abilities of machine learning models, artificial neural networks, the nonlinear autoregressive with exogenous input (NARX) model, support vector regression, Gaussian processes regression, boosted tree, bagged tree (BAT), and linear regression to fill in missing precipitation data were investigated. In developing the machine learning model, 70% of the dataset was used for training, 15% for testing, and 15% for validation. The Bayburt, Tercan, and Zara precipitation stations, which are closest to the Erzincan station and have the highest correlation coefficients, were used to fill the data gaps. The accuracy of the constructed models was tested using various statistical criteria, such as root-mean-square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe model efficiency coefficient (NSE), and determination coefficient (R2) and graphical approaches such as scattering, box plots, violin plots, and Taylor diagrams. Based on the comparison of model results, it was concluded that the BAT model with R2: 0.79 and NSE: 0.79 and error (RMSE: 11.42, and MAE: 7.93) was the most successful in the completion of missing monthly precipitation data. The contribution of this research is assist in the choice of the best and most accurate method for estimating precipitation data in semi-arid regions like Erzincan.