Clarinetist
Clarinetist

Reputation: 1187

Executing mutate() using a string using dplyr 0.7.0

I have a data frame consisting of columns in the form %Y_%m with two values of %Y for each %m, like so:

library(dplyr)

df <- data.frame(cat = c("cat1", "cat2", "cat3", "cat4"),
                 `2017_1` = c(25, 48, 34, 72),
                 `2018_1` = c(50, 34, 32, 18),
                 `2017_2` = c(30, 23, 84, 29),
                 `2018_2` = c(28, 48, 21, 15))

colnames(df) <- sub("X", "", colnames(df))

> df
   cat 2017_1 2018_1 2017_2 2018_2
1 cat1     25     50     30     28
2 cat2     48     34     23     48
3 cat3     34     32     84     21
4 cat4     72     18     29     15

I would like to compute year-over-year (YOY) percentage changes by month, where YOY(month) = value(month[most current])/value(month[least current]).

mth <- sub(".*_", "", colnames(df))
mth <- mth[mth != "cat"]
mth <- unique(mth)

for (i in 1:length(mth)){
  temp <- df %>%
    select(cat, ends_with(paste0("_", mth[i])))
  yrs <- regmatches(colnames(df), regexpr("^.*?(?=_)", colnames(df), perl = TRUE)) 
  mutate_str <- paste0("YOY_", mth[i], " = `", yrs[2], "_", 
                       mth[i], "`/`", yrs[1], "_", mth[i], "`-1")
  rm(yrs)

  temp <- temp %>%
    mutate_(mutate_str)
  rm(mutate_str)

  temp <- temp %>%
    select(cat, contains("YOY"))
  if (i == 1){
    output <- temp
    rm(temp)
  } else {
    output <- inner_join(output, temp, by = "cat")
    rm(temp)
  }
}
rm(i, mth)

colnames(output) <- sub(" =.*", "", colnames(output))

The desired output, which is what is being spit out, is as follows:

> output
   cat       YOY_1       YOY_2
1 cat1  1.00000000 -0.06666667
2 cat2 -0.29166667  1.08695652
3 cat3 -0.05882353 -0.75000000
4 cat4 -0.75000000 -0.48275862

I understand that, given the latest update to dplyr, that mutate_() will be phased out. How can the mutate_() statement, in particular, be rewritten to conform to the new syntax?

Please do not hard code the column names. There are more columns in my actual data set than in this example I've provided, but the idea is that there are only two years for each month.

Upvotes: 1

Views: 75

Answers (1)

MrFlick
MrFlick

Reputation: 206576

Your data is not in a tidy format which is why the dplyr code is such a struggle here. The much easier thing to do would be to transform your data into a tidy format (with tidyr), then do your processing. This will do the same thing without any loops or dynamic expression building.

library(dplyr)
library(tidyr)

df %>% gather(x, value, -cat) %>%
  separate(x, c("year","index")) %>% 
  group_by(cat, index) %>% 
  arrange(year) %>%
  summarize(yoy=last(value)/first(value)-1) %>%
  mutate(index=paste("YOY", index, sep="_")) %>% 
  spread(index, yoy)

#      cat       yoy_1       yoy_2
# * <fctr>       <dbl>       <dbl>
# 1   cat1  1.00000000 -0.06666667
# 2   cat2 -0.29166667  1.08695652
# 3   cat3 -0.05882353 -0.75000000
# 4   cat4 -0.75000000 -0.48275862

Upvotes: 1

Related Questions