Reputation: 130
Let's say I have a data frame that looks like this:
user_id date weight
12345 2016-03-07 160
12345 2016-03-06 158
12345 2016-03-05 156
12345 2016-03-04 154
I would like to mutate this data frame by applying multiple functions to the weight
column, then saving the results of each application to a new column. The one caveat is that this function is parameterized, and I'd like to append the parameter value to the new column name. For example, if I use lag()
:
user_id date weight dweight_1 dweight_2 ...
12345 2016-03-07 160 NA NA
12345 2016-03-06 158 160 NA
12345 2016-03-05 156 158 160
12345 2016-03-04 154 156 158
where the first new column is the result of lag(weight, 1)
, the second from lag(weight, 2)
, and so on.
I tried following the standard evaluation approach proposed in dplyr's vignette on the subject, as well as suggestions from this SO question, but neither seem to address the wrinkle of a parameterized function (otherwise I'd just use funs()
!).
How can I tackle this problem?
Upvotes: 2
Views: 324
Reputation: 883
You can use tidyquant::tq_mutate
to solve this problem.
Here is a toy example.
# Get Stock Prices from Yahoo! Finance
# Create a vector of stock symbols
FANG_symbols <- c("FB", "AMZN", "NFLX", "GOOG")
# Pass symbols to tq_get to get daily prices
FANG_data_d <- FANG_symbols %>%
tq_get(get = "stock.prices", from = "2014-01-01", to = "2016-12-31")
# Show the result
FANG_data_d
FANG_data_d %>%
select(symbol, date, adjusted) %>%
group_by(symbol) %>%
tq_mutate(
select = adjusted,
mutate_fun = lag.xts,
k = 1:5,
col_rename = column_names
)
Reference
Dancho, Matt. 2017. “Demo Week: Class(Monday) <- Tidyquant.” http://www.business-science.io/code-tools/2017/10/23/demo_week_tidyquant.html.
Upvotes: 0
Reputation: 43354
To add a single column programmatically with dplyr
, you could use something like
x <- 2
mutate_(df, .dots = setNames(list(~lag(weight, x)), paste0('dweight_', x)))
You could repeat that if it's just a couple times (even chain them together if you like), but if you're doing this a lot, it may make sense to write a function:
dweight <- function(l = 1){
for (i in l){
df <- mutate_(df, .dots = setNames(list(~lag(weight, i)), paste0('dweight_', i)))
}
df
}
which you can pass a vector:
> dweight(1:4)
user_id date weight dweight_1 dweight_2 dweight_3 dweight_4
1 12345 2016-03-07 160 NA NA NA NA
2 12345 2016-03-06 158 160 NA NA NA
3 12345 2016-03-05 156 158 160 NA NA
4 12345 2016-03-04 154 156 158 160 NA
You can edit that function as you like: add a dataframe parameter so you can chain it, use *apply
instead of for
, add a parameter to pass a function, etc. Go wild.
Upvotes: 3
Reputation: 1421
dplyr::mutate solution with standard evaluation:
tab %>% mutate_(.dots = setNames(lapply(1:4, function(i) lazyeval::interp(~lag(weight, i),
weight = as.name("weight"))), paste0("weight_", 1:4)))
# user_id date weight weight_1 weight_2 weight_3 weight_4
# 1 12345 2016-03-07 160 NA NA NA NA
# 2 12345 2016-03-06 158 160 NA NA NA
# 3 12345 2016-03-05 156 158 160 NA NA
# 4 12345 2016-03-04 154 156 158 160 NA
Edit: this is a bit tidier...
lags = 3
lag_weight <- function(i) lazyeval::interp(~lag(weight, i), weight = as.name("weight"))
tab %>% mutate_(.dots = setNames(lapply(1:lags, lag_weight), paste0('weight_', 1:lags)))
# user_id date weight weight_1 weight_2 weight_3 weight_4
# 1 12345 2016-03-07 160 NA NA NA NA
# 2 12345 2016-03-06 158 160 NA NA NA
# 3 12345 2016-03-05 156 158 160 NA NA
# 4 12345 2016-03-04 154 156 158 160 NA
Upvotes: 1
Reputation: 5532
Here is a solution that should work (there may be a cleaner way though)
# Assuming lag_vals is set as follows
lag_vals <- 1:3
names(lag_vals) <- paste0('dweight_', 1:3)
df_new <- cbind(df, sapply(lag_vals, function(x) { x=lag(df$weight, x) }))
df_new
## user_id date weight dweight_1 dweight_2 dweight_3
## 1 12345 2016-03-07 160 NA NA NA
## 2 12345 2016-03-06 158 160 NA NA
## 3 12345 2016-03-05 156 158 160 NA
## 4 12345 2016-03-04 154 156 158 160
Upvotes: 1