Gender Pay Gap in the Peruvian Government: an approach from Data Science


  • Roberto León Leyva National University of Engineering Author



Gender Pay Gap, female participation, Law of non-gender discrimination, Valuation of jobs, Cleaning and imputation of data, linear regression, clustering, profiles of pay gap


The objective of this study is to analyze the existence of the gender wage gap in the Peruvian government and identify the profiles of these gaps at the regional level, based on open data provided by the National Authority of the Civil Service (Servir), applying data science methodologies and techniques. The research was applied and quantitative in nature, explanatory in level, and used a non-experimental longitudinal design (2017-2021). The methodology used was CRISP-DM (Cross Industry Standard). During the Exploratory Data Analysis it was found that at the national level the female participation is closed to parity getting an average of 47 % on the other hand, the gender pay gap in the Government had a downward trend from 13 % (2017) to 11 % (2021), an indicator better than the general average for Latin America which was 14 % (2019 ILO), additionally, the correlation showed that to more participation correspond less pay gap, however, when the information is disaggregated by regions, heterogeneity of the gaps is evident (interquartile range for Apurimac of 25 % and Piura of 2 %) and its evolution; in many regions the pandemic generates that the improvements in the gap are reversed or slowed down. To determine which regions made progress, a simple linear regression model was used based on the year; the negative slope would indicate progress, and the positive slope would indicate reversals in the period under study; the regions that made the best progress were Moquegua, Huánuco and Ancash, while those that showed deterioration were Huancavelica, La Libertad and San Martin. To identify the gap profiles, the year 2021 was used and variables obtained from the INEI were added, such as GDP per capita and coverage per public servant, the first factor associated with the progress of the region and the second associated with government services, therefore two clusters were determined with median values of pay gap high (12,7 %) and low (8,9 %). The Brown-Mood nonparametric hypothesis test implies that the pay gap exists regardless of the labor laws that banning the discrimination of rewarding in the labor market.


Author Biography

  • Roberto León Leyva, National University of Engineering

    Profesor de la carrera de Big Data y Ciencia de datos de Tecsup, ingeniero de sistemas por la Universidad Nacional de Ingeniería (UNI) y MBA por la Universidad del Pacífico. Cuenta con estudios concluidos en la maestría de Estadística Aplicada por la Universidad Agraria La Molina (Unalm) y especializaciones por la Universidad de Michigan en Coursera: Data Analytics in the Public Sector with R y Applied Data Science with Python, así como de Datacamp: Data Scientist y Quantitative Analyst with R.


