Reputation: 1993
I am sorry if my title isn't great, if anyone have a better one, please enhance it.
I am currently struggling with the new dplyr 0.7 that enable to use variable inside my own functions using dplyr. I don't know if what I try isn't possible or if I take it the wrong way.
I have a simple data frame whith one line per company, and for each company a bunch of variables for the year 2015 and 2014 :
evo <- data.frame(id=1:5,
CA2015 = c(1200,1500,1550,200,0),
CA2014 = c(800,50,654,8555,0),
VA2015 = c(6984,6588,633,355,84),
VA2014 = c(35,358,358,1331,86))
I would like to be able to create, for each of my variable, an "evo" variable which is basically the value of 2015 minus the value of 2013.
In my example, I want to get a variable named evoCA which is equal to CA2015 - CA2014, and a variable named evoVA which is equal to VA2015 - VA2014.
(basically :
evo %>%
mutate(evoCA= CA2015 - CA2014,
evoVA = VA2015 - VA2014)
)
So I tried to make a function that would take into parameter the name of the variable I want to diff, but I can't make it work. Here's the best I can do :
addEvo <- function(table,var,var2014,var2015) {
var <- enquo(var)
var2014 <- enquo(var2014)
var2015 <- enquo(var2015)
evoName <- paste0("evo",var)[2]
table %>%
mutate(!!evoNom := (!!var2015) - (!!var2013)) %>%
return()
}
but it's not very clean because I have to pass not only the variable name, but the variable with its suffixes.
Any idea to make it better ?
Upvotes: 2
Views: 131
Reputation: 6552
using tidyr
you could transform your data to something more tidy.
If the years are always the same two years and the prefix is always before the years, something like like could add all the columns you want:
evo <- data.frame(id=1:5,
CA2015 = c(1200,1500,1550,200,0),
CA2014 = c(800,50,654,8555,0),
VA2015 = c(6984,6588,633,355,84),
VA2014 = c(35,358,358,1331,86))
library(dplyr, warn.conflicts = F)
library(tidyr, warn.conflicts = F)
evo %>%
gather(key = "type", value = "value", -id) %>%
separate(type, c("prefix", "year"), sep = -5) %>%
spread(year, value) %>%
mutate(evo = `2015` - `2014`) %>%
gather(key = "key", value = "value", -(id:prefix)) %>%
unite("type", prefix:key, sep = "") %>%
spread(type, value) %>%
select(id, ends_with("2015"), ends_with("2014"), ends_with("evo"))
#> id CA2015 VA2015 CA2014 VA2014 CAevo VAevo
#> 1 1 1200 6984 800 35 400 6949
#> 2 2 1500 6588 50 358 1450 6230
#> 3 3 1550 633 654 358 896 275
#> 4 4 200 355 8555 1331 -8355 -976
#> 5 5 0 84 0 86 0 -2
if the two years are different than 2014 and 2015, you could also make a function.
Upvotes: 1
Reputation: 590
Here's a solution that does not require writing a new function:
library(dplyr)
evo <- data.frame(id=1:5,
CA2015 = c(1200,1500,1550,200,0),
CA2014 = c(800,50,654,8555,0),
VA2015 = c(6984,6588,633,355,84),
VA2014 = c(35,358,358,1331,86))
# This creates two dataframes with the same properties and two diff years
evo2014 <- evo %>%
select(contains("2014"))
evo2015 <- evo %>%
select(contains("2015"))
# If both have the same amount of columns make a matrix subtraction and obtain the difference. Then bind the difference dataframe to the original df
if (ncol(evo2014) == ncol(evo2015)) {
evodiff <- as.matrix(evo2015 - evo2014)
varnames <- substr(colnames(evo2014), 1, 2)
colnames(evodiff) <- paste0("evo", varnames)
evo <- bind_cols(evo, as.data.frame(evodiff))
}
evo
#> id CA2015 CA2014 VA2015 VA2014 evoCA evoVA
#> 1 1 1200 800 6984 35 400 6949
#> 2 2 1500 50 6588 358 1450 6230
#> 3 3 1550 654 633 358 896 275
#> 4 4 200 8555 355 1331 -8355 -976
#> 5 5 0 0 84 86 0 -2
Upvotes: 2