Reputation: 1194
Using dplyr
syntax, I am looking for an "elegant" way to mutate
new variables to a given data frame containing percentage changes across different time points.
As an example, consider the following data.frame
containing variables (var.a
, var.b
, var.c
) taken at different time points (A
, B
).
set.seed(123)
df <- data.frame(replicate(6,sample(1:100,50,rep=TRUE)))
names(df) <- c("A.var.a", "A.var.b", "A.var.c",
"B.var.a", "B.var.b", "B.var.c")
How could I calculate percentage changes from A
to B
of each respective variable, and mutate
them to the data frame df
? I am looking for a dplyr
or any other "smart" solution that I could implement in a bigger data frame containing more time points and variables, avoiding typing one by one new variable containing the desired output.
Upvotes: 0
Views: 450
Reputation: 54267
You could tidy it up and then aggregate each pair. Something like
library(tidyverse)
df %>% tibble::rowid_to_column() %>%
gather(key, value, -rowid) %>%
separate(key, c("time", "var"), extra = "merge") %>% # split "A.var.a" into "A" & "var.a"
arrange(rowid, var, time) %>%
group_by(rowid, var) %>%
summarise(diff = diff(value)/value[1]*100) %>%
spread(var, diff)
# # A tibble: 50 x 4
# # Groups: rowid [50]
# rowid var.a var.b var.c
# * <int> <dbl> <dbl> <dbl>
# 1 1 193.103448 380.00000 -56.66667
# 2 2 -36.708861 115.55556 -32.35294
# 3 3 -4.878049 -23.75000 22.44898
# 4 4 -71.910112 300.00000 -71.87500
# 5 5 -87.368421 -28.07018 10.20408
# 6 6 680.000000 323.80952 -12.22222
# 7 7 9.433962 184.61538 -81.52174
# 8 8 -75.555556 -61.84211 -32.78689
# 9 9 -19.642857 -80.00000 14.28571
# 10 10 -52.173913 -52.63158 480.00000
# # ... with 40 more rows
?
To add the columns to your original data, you could add %>% ungroup() %>% select(-rowid) %>% rename_all(~paste0("perc_", .x)) %>% bind_cols(df)
to the above lines.
Or try a rowwise approach in the veins of
f <- function(x) {
as.list(diff(x, lag=3)/x[1:3]*100)
}
df %>%
rowwise %>%
do(data.frame(perc = f(unlist(.)))
) %>%
bind_cols(df)
# Source: local data frame [50 x 9]
# Groups: <by row>
#
# # A tibble: 50 x 9
# perc.B.var.a perc.B.var.b perc.B.var.c A.var.a A.var.b A.var.c B.var.a B.var.b B.var.c
# <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
# 1 193.103448 380.00000 -56.66667 29 5 60 85 24 26
# 2 -36.708861 115.55556 -32.35294 79 45 34 50 97 23
# 3 -4.878049 -23.75000 22.44898 41 80 49 39 61 60
# 4 -71.910112 300.00000 -71.87500 89 13 96 25 52 27
# 5 -87.368421 -28.07018 10.20408 95 57 49 12 41 54
# 6 680.000000 323.80952 -12.22222 5 21 90 39 89 79
# 7 9.433962 184.61538 -81.52174 53 13 92 58 37 17
# 8 -75.555556 -61.84211 -32.78689 90 76 61 22 29 41
# 9 -19.642857 -80.00000 14.28571 56 90 42 45 18 48
# 10 -52.173913 -52.63158 480.00000 46 38 15 22 18 87
# # ... with 40 more rows
Both may need some tweaking...
Upvotes: 1