Reputation: 57
I repost here what I posted on stats exchange, having been told it was better suited for stack overflow. Here is the structure of my dataset for reproducibility :
structure(list(numero = c("133", "62", "75", "76", "86", "281"
), tranche_age = c("20-30", "20-30", "20-30", "20-30", "20-30",
"20-30"), tranche_anciennete = c("5 ans et moins", "5 à 10 ans",
"5 ans et moins", "5 ans et moins", "5 à 10 ans", "5 à 10 ans"
), code_statut = c("C", "E", "E", "E", "E", "E"), code_contrat = c("A",
"A", "A", "A", "A", "A"), taux_demploi_mois = c(100, 100, 100,
100, 100, 100), echelon = c("E1", NA, NA, NA, NA, NA), niveau = c("N7",
NA, NA, NA, NA, NA), brut_mensuel = c(NA, 786.13, 1156.95, 1156.95,
904.79, 904.79), estimation_annuelle = c(NA, 10219.69, 15040.35,
15040.35, 11762.27, 11762.27), annee = c(2017, 2017, 2017, 2017,
2017, 2017), primes_en_montant = c(0, 0, 0, 0, 0, 0), primes_en_pourcentage =
c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), brut_mensuel_ETP = c(NA,
786.13, 1156.95, 1156.95, 904.79, 904.79)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Each worker is identified with one number ("numero"), which doesn't change from year to year. I would like to compute a new variable, to add to this dataframe, representing the evolution of the "estimation_annuelle" (which is the yearly wage) of each worker, from year to year (from 2017 to 2021), and then the average annual growth rate over the 5 years. Then, I would like to view those who have less than a 2% raise on one year (2017-2018 for example), and see whether it has been caught up in the following years or no (that is, if one's wage has increased by less than 2% between 2017 and 2018, if the wage increased one had between 2018 and 2019 compensated, and by how much, the insufficient raise on the previous yearly period).
I have tried a code to compute the variable evolution from year to year, which doesn't work :
test <- liste_complete %>%
group_by(annee, numero) %>%
select(numero, annee, estimation_annuelle)%>%
data.frame()
for(i in 1:length(test$estimation_annuelle)) {
print((test[i+1,] - test[i,])/test[i,])
}
And I have not found anything to compute the average annual growth rate (here is the formula : https://investinganswers.com/dictionary/a/average-annual-growth-rate-aagr), nor computed whether the insufficient increase for those who are concerned has been made up for in the following years.
Could anyone help ?
Upvotes: 0
Views: 63
Reputation: 5958
We can use a summarise
then a match
.
df$annee <- c(2017, 2017, 2018,2018, 2019,2019)
df$brut_mensuel[1] <- 11000
# first, summarise
summary <- df %>% select(numero, annee, estimation_annuelle, brut_mensuel) %>%
group_by(annee) %>% summarise(estimation_annuelle=mean(brut_mensuel)) %>% arrange(annee) %>%
mutate(salaire_annee_prec = lag(estimation_annuelle),
variation_annee_precedente=(estimation_annuelle-salaire_annee_prec)/salaire_annee_prec)
# matching
df$variation_annee_prec <- summary$variation_annee_precedente[match(df$annee,summary$annee)]
df
# A tibble: 6 x 15
numero tranche_age tranche_anciennete code_statut code_contrat taux_demploi_mois echelon niveau brut_mensuel estimation_annuelle annee primes_en_montant
<chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 133 20-30 5 ans et moins C A 100 E1 N7 11000 NA 2017 0
2 62 20-30 5 à 10 ans E A 100 NA NA 786. 10220. 2017 0
3 75 20-30 5 ans et moins E A 100 NA NA 1157. 15040. 2018 0
4 76 20-30 5 ans et moins E A 100 NA NA 1157. 15040. 2018 0
5 86 20-30 5 à 10 ans E A 100 NA NA 905. 11762. 2019 0
6 281 20-30 5 à 10 ans E A 100 NA NA 905. 11762. 2019 0
primes_en_pourcentage brut_mensuel_ETP variation_annee_prec
<dbl> <dbl> <dbl>
1 NA NA NA
2 NA 786. NA
3 NA 1157. -0.804
4 NA 1157. -0.804
5 NA 905. -0.218
6 NA 905. -0.218
Upvotes: 1