Reputation: 1187
I have a data frame consisting of columns in the form %Y_%m
with two values of %Y
for each %m
, like so:
library(dplyr)
df <- data.frame(cat = c("cat1", "cat2", "cat3", "cat4"),
`2017_1` = c(25, 48, 34, 72),
`2018_1` = c(50, 34, 32, 18),
`2017_2` = c(30, 23, 84, 29),
`2018_2` = c(28, 48, 21, 15))
colnames(df) <- sub("X", "", colnames(df))
> df
cat 2017_1 2018_1 2017_2 2018_2
1 cat1 25 50 30 28
2 cat2 48 34 23 48
3 cat3 34 32 84 21
4 cat4 72 18 29 15
I would like to compute year-over-year (YOY) percentage changes by month, where YOY(month) = value(month[most current])/value(month[least current])
.
mth <- sub(".*_", "", colnames(df))
mth <- mth[mth != "cat"]
mth <- unique(mth)
for (i in 1:length(mth)){
temp <- df %>%
select(cat, ends_with(paste0("_", mth[i])))
yrs <- regmatches(colnames(df), regexpr("^.*?(?=_)", colnames(df), perl = TRUE))
mutate_str <- paste0("YOY_", mth[i], " = `", yrs[2], "_",
mth[i], "`/`", yrs[1], "_", mth[i], "`-1")
rm(yrs)
temp <- temp %>%
mutate_(mutate_str)
rm(mutate_str)
temp <- temp %>%
select(cat, contains("YOY"))
if (i == 1){
output <- temp
rm(temp)
} else {
output <- inner_join(output, temp, by = "cat")
rm(temp)
}
}
rm(i, mth)
colnames(output) <- sub(" =.*", "", colnames(output))
The desired output, which is what is being spit out, is as follows:
> output
cat YOY_1 YOY_2
1 cat1 1.00000000 -0.06666667
2 cat2 -0.29166667 1.08695652
3 cat3 -0.05882353 -0.75000000
4 cat4 -0.75000000 -0.48275862
I understand that, given the latest update to dplyr
, that mutate_()
will be phased out. How can the mutate_()
statement, in particular, be rewritten to conform to the new syntax?
Please do not hard code the column names. There are more columns in my actual data set than in this example I've provided, but the idea is that there are only two years for each month.
Upvotes: 1
Views: 75
Reputation: 206576
Your data is not in a tidy format which is why the dplyr code is such a struggle here. The much easier thing to do would be to transform your data into a tidy format (with tidyr
), then do your processing. This will do the same thing without any loops or dynamic expression building.
library(dplyr)
library(tidyr)
df %>% gather(x, value, -cat) %>%
separate(x, c("year","index")) %>%
group_by(cat, index) %>%
arrange(year) %>%
summarize(yoy=last(value)/first(value)-1) %>%
mutate(index=paste("YOY", index, sep="_")) %>%
spread(index, yoy)
# cat yoy_1 yoy_2
# * <fctr> <dbl> <dbl>
# 1 cat1 1.00000000 -0.06666667
# 2 cat2 -0.29166667 1.08695652
# 3 cat3 -0.05882353 -0.75000000
# 4 cat4 -0.75000000 -0.48275862
Upvotes: 1