gerardlambert
gerardlambert

Reputation: 57

for loop in R to compute yearly evolution of a variable

I repost here what I posted on stats exchange, having been told it was better suited for stack overflow. Here is the structure of my dataset for reproducibility :

structure(list(numero = c("133", "62", "75", "76", "86", "281"
), tranche_age = c("20-30", "20-30", "20-30", "20-30", "20-30", 
"20-30"), tranche_anciennete = c("5 ans et moins", "5 à 10 ans", 
"5 ans et moins", "5 ans et moins", "5 à 10 ans", "5 à 10 ans"
), code_statut = c("C", "E", "E", "E", "E", "E"), code_contrat = c("A", 
"A", "A", "A", "A", "A"), taux_demploi_mois = c(100, 100, 100, 
100, 100, 100), echelon = c("E1", NA, NA, NA, NA, NA), niveau = c("N7", 
NA, NA, NA, NA, NA), brut_mensuel = c(NA, 786.13, 1156.95, 1156.95, 
904.79, 904.79), estimation_annuelle = c(NA, 10219.69, 15040.35, 
15040.35, 11762.27, 11762.27), annee = c(2017, 2017, 2017, 2017, 
2017, 2017), primes_en_montant = c(0, 0, 0, 0, 0, 0), primes_en_pourcentage = 
c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), brut_mensuel_ETP = c(NA, 
786.13, 1156.95, 1156.95, 904.79, 904.79)), row.names = c(NA, -6L), class = c("tbl_df", 
 "tbl", "data.frame"))

Each worker is identified with one number ("numero"), which doesn't change from year to year. I would like to compute a new variable, to add to this dataframe, representing the evolution of the "estimation_annuelle" (which is the yearly wage) of each worker, from year to year (from 2017 to 2021), and then the average annual growth rate over the 5 years. Then, I would like to view those who have less than a 2% raise on one year (2017-2018 for example), and see whether it has been caught up in the following years or no (that is, if one's wage has increased by less than 2% between 2017 and 2018, if the wage increased one had between 2018 and 2019 compensated, and by how much, the insufficient raise on the previous yearly period).

I have tried a code to compute the variable evolution from year to year, which doesn't work :

test <- liste_complete %>%
   group_by(annee, numero) %>%
   select(numero, annee, estimation_annuelle)%>%
   data.frame()

for(i in 1:length(test$estimation_annuelle)) {
  print((test[i+1,] - test[i,])/test[i,])
}

And I have not found anything to compute the average annual growth rate (here is the formula : https://investinganswers.com/dictionary/a/average-annual-growth-rate-aagr), nor computed whether the insufficient increase for those who are concerned has been made up for in the following years.

Could anyone help ?

Upvotes: 0

Views: 63

Answers (1)

gaut
gaut

Reputation: 5958

We can use a summarise then a match.

df$annee <- c(2017, 2017, 2018,2018, 2019,2019)
df$brut_mensuel[1] <- 11000

# first, summarise
summary <- df %>% select(numero, annee, estimation_annuelle, brut_mensuel) %>% 
  group_by(annee) %>% summarise(estimation_annuelle=mean(brut_mensuel)) %>% arrange(annee) %>% 
                            mutate(salaire_annee_prec = lag(estimation_annuelle),
                            variation_annee_precedente=(estimation_annuelle-salaire_annee_prec)/salaire_annee_prec)

# matching
df$variation_annee_prec <- summary$variation_annee_precedente[match(df$annee,summary$annee)]
df
# A tibble: 6 x 15
  numero tranche_age tranche_anciennete code_statut code_contrat taux_demploi_mois echelon niveau brut_mensuel estimation_annuelle annee primes_en_montant
  <chr>  <chr>       <chr>              <chr>       <chr>                    <dbl> <chr>   <chr>         <dbl>               <dbl> <dbl>             <dbl>
1 133    20-30       5 ans et moins     C           A                          100 E1      N7           11000                  NA   2017                 0
2 62     20-30       5 à 10 ans         E           A                          100 NA      NA             786.              10220.  2017                 0
3 75     20-30       5 ans et moins     E           A                          100 NA      NA            1157.              15040.  2018                 0
4 76     20-30       5 ans et moins     E           A                          100 NA      NA            1157.              15040.  2018                 0
5 86     20-30       5 à 10 ans         E           A                          100 NA      NA             905.              11762.  2019                 0
6 281    20-30       5 à 10 ans         E           A                          100 NA      NA             905.              11762.  2019                 0
  primes_en_pourcentage brut_mensuel_ETP variation_annee_prec
                  <dbl>            <dbl>                <dbl>
1                    NA              NA                NA    
2                    NA             786.               NA    
3                    NA            1157.               -0.804
4                    NA            1157.               -0.804
5                    NA             905.               -0.218
6                    NA             905.               -0.218

Upvotes: 1

Related Questions