Reputation: 110
I have a data.frame mydata
like
index x y z
<int> <int> <int> <int>
1 1 2 3
2 3 4 5
3 3 4 9
....
What I like to do is to apply the same function fun(input, par1, par2, par3)
on columns (x, y, z), to modify them to new values (replace the original column)
The problem is that for any one of the column, it has its own parameters set, i.e. par1, par2, par3
, which was set in another table mypar
name par1 par2 par3
<chr> <dbl> <dbl> <dbl>
x 0.1 0.2 0.1
y 0.5 0.1 0.7
z 0.3 0.9 0.5
if the parameters for all columns x, y, z
are the same,say [0.1, 0.2, 0.3]
, I can use dplyr mutate_at
or data.table .SDcol
names = c("x", "y", "z")
mydata %>% mutate_at(names, ~fun(., 0.1, 0.2, 0.3))
Or
mydata.dt[, (names) := lapply(.SD, fun, 0.1, 0.2, 0.3), .SDcol = names]
Not, I want to integrate the mypar
information, parameter values corresponding to different column, to such process. Is there any way to do that in tidyverse
way or data.table
?
Thank you.
Updated:
@IceCreamToucan @akrun, thank both of you.
I have tested both the gather
/cast
and map2_df
methods.
And I found something interesting.
In my case, fun()
accepts a vector of length N, and returns a vector of length N, it is a window
function.
For example, if I change the function in your answer as fun1 <- function(x, p1, p2, p3) x/cumsum(x) + p1/p2 -p3
, only the map2_df
gives the correct result.
More complicated, by using the map2_df
, I'd like keep other columns beside x, y, z
in the final results, and also may need to be done on groups. Such as the original data as:
(final result will have all columns, but with updated x, y, z
index group x y z others
<int> <fct> <int> <int> <int> <dbl>
1 a 1 2 3 1.2
2 a 3 4 5 3.4
3 a 3 4 9 4.5
1 b 5 2 1 5.5
2 b 4 3 9 3.9
3 b 2 9 1 2.9
....
Upvotes: 1
Views: 752
Reputation: 25208
How about retrieving the parameters as required?
setkey(mypar, name)
cols <- c('x','y','z')
mydata[, paste0(cols, "_new") := lapply(cols,
function(x) fun1(.SD[[x]], mypar[x]$par1, mypar[x]$par2, mypar[x]$par3)),
.SDcols=cols]
output:
index x y z x_new y_new z_new
1: 1 1 2 3 1.4000000 5.300000 0.8333333
2: 2 3 4 5 1.1500000 4.966667 0.4583333
3: 3 3 4 9 0.8285714 4.700000 0.3627451
data:
library(data.table)
mypar <- fread("name par1 par2 par3
x 0.1 0.2 0.1
y 0.5 0.1 0.7
z 0.3 0.9 0.5")
mydata <- fread("index x y z
1 1 2 3
2 3 4 5
3 3 4 9")
fun1 <- function(x, p1, p2, p3) x/cumsum(x) + p1/p2 -p3
Upvotes: 1
Reputation: 887991
We gather
the original dataset into 'long' format, then do a left_join
with the 'mypar', transmute
to create a 'newcol' based on the function and then spread
to 'wide' format
f1 <- function(x, p1, p2, p3) x + p1/p2 - p3
library(tidyverse)
gather(mydata, name, val, -index) %>%
left_join(mypar) %>%
transmute(index, name, newcol =
f1(x = val, p1 = par1, p2 = par2, p3 = par3)) %>%
spread(name, newcol)
# index x y z
#1 1 1.4 6.3 2.833333
#2 2 3.4 8.3 4.833333
#3 3 3.4 8.3 8.833333
Or using map
map2_df(mydata %>%
select(mypar$name),
map(mypar$name, ~ mypar %>%
slice(match(.x, name)) %>%
select(-name)), ~ f1(.x, .y[[1]], .y[[2]], .y[[3]]))
v1 <- c(0.1, 0.2, 0.3)
mydata <- structure(list(index = 1:3, x = c(1L, 3L, 3L), y = c(2L, 4L,
4L), z = c(3L, 5L, 9L)), row.names = c(NA, -3L), class = "data.frame")
mypar <- structure(list(name = c("x", "y", "z"), par1 = c(0.1, 0.5, 0.3
), par2 = c(0.2, 0.1, 0.9), par3 = c(0.1, 0.7, 0.5)), row.names = c(NA,
-3L), class = "data.frame")
Upvotes: 2
Reputation: 28705
If you melt mydata
to long format you can join with mypar
to add the paramaters as columns. Then you can Map
over the columns of the resulting data table, since you now have the paramaters as columns. After that you can use dcast
to put the data back in wide format.
library(data.table)
setDT(mypar)
setDT(mydata)
setnames(mypar, 'name', 'variable')
long_out <-
merge(melt(mydata, 1), mypar, by = 'variable')[,
fun_out := Map(fun, value, par1, par2, par3)]
dcast(long_out, index ~ variable, value.var = 'fun_out')
# index x y z
# 1: 1 1.4 6.3 2.833333
# 2: 2 3.4 8.3 4.833333
# 3: 3 3.4 8.3 8.833333
Data used
fun <- function(x, p1, p2, p3) x + p1/p2 -p3
mypar <- fread('
name par1 par2 par3
x 0.1 0.2 0.1
y 0.5 0.1 0.7
z 0.3 0.9 0.5
')
mydata <- fread('
index x y z
1 1 2 3
2 3 4 5
3 3 4 9
')
Upvotes: 3