Notna
Notna

Reputation: 525

Average percentage change over different years in R

I have a data frame from which I created a reproducible example:

country <- c('A','A','A','B','B','C','C','C','C')
year <- c(2010,2011,2015,2008,2009,2008,2009,2011,2015)
score <- c(1,2,2,1,4,1,1,3,2)

  country year score
1       A 2010     1
2       A 2011     2
3       A 2015     2
4       B 2008     1
5       B 2009     4
6       C 2008     1
7       C 2009     1
8       C 2011     3
9       C 2015     2

And I am trying to calculate the average percentage increase (or decrease) in the score for each country by calculating [(final score - initial score) ÷ (initial score)] for each year and averaging it over the number of years.

 country year score  change
1       A 2010     1     NA
2       A 2011     2      1
3       A 2015     2      0
4       B 2008     1     NA
5       B 2009     4      3
6       C 2008     1     NA
7       C 2009     1      0
8       C 2011     3      2
9       C 2015     2  -0.33

The final result I am hoping to obtain:

  country  avg_change
1       A         0.5
2       B           3
3       C        0.55

As you can see, the trick is that countries have spans over different years, sometimes with a missing year in between. I tried different ways to do it manually but I do struggle. If someone could hint me a solution would be great. Many thanks.

Upvotes: 2

Views: 2787

Answers (2)

akrun
akrun

Reputation: 887173

We can use data.table to group by 'country' and take the mean of the difference between the 'score' and the lag of 'score'

library(data.table)
setDT(df1)[, .(avg_change = mean(score -lag(score), na.rm = TRUE)), .(country)]
#   country avg_change
#1:       A  0.5000000
#2:       B  3.0000000
#3:       C  0.3333333

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

With dplyr, we can group_by country and get mean of difference between scores.

library(dplyr)

df %>%
  group_by(country) %>%
  summarise(avg_change = mean(c(NA, diff(score)), na.rm = TRUE))

# country avg_change
#  <fct>        <dbl>
#1  A            0.500
#2  B            3.00 
#3  C            0.333

Using base R aggregate with same logic

aggregate(score~country, df, function(x) mean(c(NA, diff(x)), na.rm = TRUE))

Upvotes: 6

Related Questions