How can I calculate percentage changes of multiple variables and time points at once?

Question

Using dplyr syntax, I am looking for an "elegant" way to mutate new variables to a given data frame containing percentage changes across different time points.

As an example, consider the following data.frame containing variables (var.a, var.b, var.c) taken at different time points (A, B).

set.seed(123)
df <- data.frame(replicate(6,sample(1:100,50,rep=TRUE)))
names(df) <- c("A.var.a", "A.var.b", "A.var.c", 
               "B.var.a", "B.var.b", "B.var.c")

How could I calculate percentage changes from A to B of each respective variable, and mutate them to the data frame df ? I am looking for a dplyr or any other "smart" solution that I could implement in a bigger data frame containing more time points and variables, avoiding typing one by one new variable containing the desired output.

lukeA · Accepted Answer

You could tidy it up and then aggregate each pair. Something like

library(tidyverse)
df %>% tibble::rowid_to_column() %>% 
  gather(key, value, -rowid) %>% 
  separate(key, c("time", "var"), extra = "merge") %>% # split "A.var.a" into "A" & "var.a"
  arrange(rowid, var, time) %>%
  group_by(rowid, var) %>%
  summarise(diff = diff(value)/value[1]*100) %>%
  spread(var, diff) 
# # A tibble: 50 x 4
# # Groups:   rowid [50]
#    rowid      var.a     var.b     var.c
#  *                 
#  1     1 193.103448 380.00000 -56.66667
#  2     2 -36.708861 115.55556 -32.35294
#  3     3  -4.878049 -23.75000  22.44898
#  4     4 -71.910112 300.00000 -71.87500
#  5     5 -87.368421 -28.07018  10.20408
#  6     6 680.000000 323.80952 -12.22222
#  7     7   9.433962 184.61538 -81.52174
#  8     8 -75.555556 -61.84211 -32.78689
#  9     9 -19.642857 -80.00000  14.28571
# 10    10 -52.173913 -52.63158 480.00000
# # ... with 40 more rows

?

To add the columns to your original data, you could add %>% ungroup() %>% select(-rowid) %>% rename_all(~paste0("perc_", .x)) %>% bind_cols(df) to the above lines.

Or try a rowwise approach in the veins of

 f <- function(x) {
  as.list(diff(x, lag=3)/x[1:3]*100)
}
df %>% 
  rowwise %>% 
  do(data.frame(perc = f(unlist(.)))
  ) %>% 
  bind_cols(df)
# Source: local data frame [50 x 9]
# Groups: 
# 
# # A tibble: 50 x 9
#    perc.B.var.a perc.B.var.b perc.B.var.c A.var.a A.var.b A.var.c B.var.a B.var.b B.var.c
#                                             
#  1   193.103448    380.00000    -56.66667      29       5      60      85      24      26
#  2   -36.708861    115.55556    -32.35294      79      45      34      50      97      23
#  3    -4.878049    -23.75000     22.44898      41      80      49      39      61      60
#  4   -71.910112    300.00000    -71.87500      89      13      96      25      52      27
#  5   -87.368421    -28.07018     10.20408      95      57      49      12      41      54
#  6   680.000000    323.80952    -12.22222       5      21      90      39      89      79
#  7     9.433962    184.61538    -81.52174      53      13      92      58      37      17
#  8   -75.555556    -61.84211    -32.78689      90      76      61      22      29      41
#  9   -19.642857    -80.00000     14.28571      56      90      42      45      18      48
# 10   -52.173913    -52.63158    480.00000      46      38      15      22      18      87
# # ... with 40 more rows

Both may need some tweaking...

How can I calculate percentage changes of multiple variables and time points at once?

Answers (1)

Related Questions