Data Mining Methodologies Applied to the Forecast of COVID-19 Cases in Peru

Authors

  • Roberto León Leyva National University of Engineering image/svg+xml Author

DOI:

https://doi.org/10.71701/03gcw835

Keywords:

Data mining, forecast, COVID-19, Peru, time series, Arima, CRISP, open government data

Abstract

The CRISP methodology of data mining is applied on open government data of COVID-19 for the case of Peru, and time series techniques are used to discover the best models that allow forecasts to be made regarding confirmed cases. The phases of the methodology are applied iteratively: data cleaning, detection of insights, selection of the Arima (autoregressive integrated moving average) model for the analysis of time series, and estimation of the parameters that characterize the time series. The study concludes that, for open government data on COVID-19 in the case of Peru, the use of daily data is not convenient because there are significant differences by the weekday and that the best approximation of grouping cases is the week. Based on this, it is determined that there is no single model at the country level or the department level, so models are proposed at the provincial level that is statistically significant for making short-term forecasts.

Downloads

Download data is not yet available.

References

Azevedo, A. y Santos, M. (2008). KDD, SEMMA and CRISP-DM: a parallel overview. European Conference on Data Mining (pp. 182-185). Recuperado de https://www.researchgate.net/figure/Summary-of-the-correspondences-between-KDD-SEMMA-and-CRISP-DM_tbl1_220969845.

Our World in Data. Coronavirus (COVID-19) deaths. Recuperado de https://ourworldindata.org/covid-deaths?country=~PER.

Banco Central de Reserva del Perú. PBI desestacionalizado - promedio móvil 3 meses. Recuperado de https://estadisticas.bcrp.gob.pe/estadisticas/series/mensuales/resultados/PN38081AM/html/2019-1/2021-3.

Plataforma Nacional de Datos Abiertos. Casos positivos por COVID-19 - [Ministerio de Salud - MINSA]. Recuperado de https://www.datosabiertos.gob.pe/dataset/casos-positivos-por-covid-19-ministerio-de-salud-minsa.

Saltz, J. (30 de noviembre de 2020). CRISP-DM is still the most popular framework for executing data science projects. Data Science Process Alliance. Recuperado de https://www.datascience-pm.com/crisp-dm-still-most-popular.

Piatetsky, G. (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDnuggets. Recuperado de https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html.

Chapman, P. (2000). CRISP-DM 1.0. Atlanta: SPSS.

Cirillo, A. (2017). R Data Mining. Birmingham-Mumbai: Packt.

Enders, W. (2014). Applied econometric time series. Massachusets: Wiley.

Hydman, R. y Athanasopoulos, G. (2014). Forecasting: principles and practice. Melbourne: OTexts.

Orosco Gavilán, J. (2019). Uso de los modelos heterocedásticos con bootstrap en el análisis del índice general de la Bolsa de Valores de Lima [tesis de maestría]. Universidad Nacional Agraria La Molina, Lima, Perú.

Porras Cerrón, J. (2017). Pruebas no paramétricas usando R. Lima: Universidad Nacional Agraria La Molina.

Shumway, R. y Stoffer, D. (2017). Time series analysis and its applications. Pittsburg: Springer.

Tandon, H., Ranjan, P., Chakraborty, T. y Suhag, V. (2020). Coronavirus (COVID-19): Arima based time- series analysis to forecast near future. Recuperado el 18 de abril de 2021 de https://arxiv.org/abs/2004.07859.

Downloads

Published

2024-10-11

Issue

Section

Artículos

How to Cite

Data Mining Methodologies Applied to the Forecast of COVID-19 Cases in Peru. (2024). Revista I+i, 15. https://doi.org/10.71701/03gcw835