FENG QI
FENG QI

Reputation: 110

Using functions of multiple columns in a data.frame (data.table) with multiple parameters

I have a data.frame mydata like

index     x     y     z
<int> <int> <int> <int>
    1     1     2     3
    2     3     4     5
    3     3     4     9
    ....

What I like to do is to apply the same function fun(input, par1, par2, par3) on columns (x, y, z), to modify them to new values (replace the original column)

The problem is that for any one of the column, it has its own parameters set, i.e. par1, par2, par3, which was set in another table mypar

 name  par1  par2  par3
<chr> <dbl> <dbl> <dbl>
    x   0.1   0.2   0.1 
    y   0.5   0.1   0.7
    z   0.3   0.9   0.5

if the parameters for all columns x, y, z are the same,say [0.1, 0.2, 0.3], I can use dplyr mutate_at or data.table .SDcol

 names = c("x", "y", "z")
 mydata %>%  mutate_at(names, ~fun(., 0.1, 0.2, 0.3))

Or

 mydata.dt[, (names) := lapply(.SD, fun, 0.1, 0.2, 0.3), .SDcol = names]

Not, I want to integrate the mypar information, parameter values corresponding to different column, to such process. Is there any way to do that in tidyverse way or data.table?

Thank you.

Updated:

@IceCreamToucan @akrun, thank both of you.

I have tested both the gather/cast and map2_df methods.

And I found something interesting.

In my case, fun() accepts a vector of length N, and returns a vector of length N, it is a window function.

For example, if I change the function in your answer as fun1 <- function(x, p1, p2, p3) x/cumsum(x) + p1/p2 -p3, only the map2_df gives the correct result.

More complicated, by using the map2_df, I'd like keep other columns beside x, y, z in the final results, and also may need to be done on groups. Such as the original data as: (final result will have all columns, but with updated x, y, z

index  group      x     y     z  others
<int>  <fct>  <int> <int> <int>   <dbl>
    1      a      1     2     3     1.2
    2      a      3     4     5     3.4
    3      a      3     4     9     4.5
    1      b      5     2     1     5.5
    2      b      4     3     9     3.9
    3      b      2     9     1     2.9
    .... 

Upvotes: 1

Views: 752

Answers (3)

chinsoon12
chinsoon12

Reputation: 25208

How about retrieving the parameters as required?

setkey(mypar, name)
cols <- c('x','y','z')
mydata[, paste0(cols, "_new") := lapply(cols, 
        function(x) fun1(.SD[[x]], mypar[x]$par1, mypar[x]$par2, mypar[x]$par3)), 
    .SDcols=cols]

output:

   index x y z     x_new    y_new     z_new
1:     1 1 2 3 1.4000000 5.300000 0.8333333
2:     2 3 4 5 1.1500000 4.966667 0.4583333
3:     3 3 4 9 0.8285714 4.700000 0.3627451

data:

library(data.table)
mypar <- fread("name  par1  par2  par3
x   0.1   0.2   0.1 
y   0.5   0.1   0.7
z   0.3   0.9   0.5")

mydata <- fread("index     x     y     z
1     1     2     3
2     3     4     5
3     3     4     9")

fun1 <- function(x, p1, p2, p3) x/cumsum(x) + p1/p2 -p3

Upvotes: 1

akrun
akrun

Reputation: 887991

We gather the original dataset into 'long' format, then do a left_join with the 'mypar', transmute to create a 'newcol' based on the function and then spread to 'wide' format

f1 <- function(x, p1, p2, p3) x + p1/p2 - p3
library(tidyverse)
gather(mydata, name, val,  -index) %>%
    left_join(mypar) %>% 
     transmute(index, name, newcol = 
        f1(x = val, p1 = par1, p2 = par2, p3 = par3)) %>%
     spread(name, newcol)
# index   x   y        z
#1     1 1.4 6.3 2.833333
#2     2 3.4 8.3 4.833333
#3     3 3.4 8.3 8.833333

Or using map

map2_df(mydata %>%
            select(mypar$name),
       map(mypar$name, ~  mypar %>%
            slice(match(.x, name)) %>% 
            select(-name)), ~ f1(.x, .y[[1]], .y[[2]], .y[[3]]))

data

v1 <- c(0.1, 0.2, 0.3)
mydata <- structure(list(index = 1:3, x = c(1L, 3L, 3L), y = c(2L, 4L, 
4L), z = c(3L, 5L, 9L)), row.names = c(NA, -3L), class = "data.frame")

mypar <- structure(list(name = c("x", "y", "z"), par1 = c(0.1, 0.5, 0.3
), par2 = c(0.2, 0.1, 0.9), par3 = c(0.1, 0.7, 0.5)), row.names = c(NA, 
-3L), class = "data.frame")

Upvotes: 2

IceCreamToucan
IceCreamToucan

Reputation: 28705

If you melt mydata to long format you can join with mypar to add the paramaters as columns. Then you can Map over the columns of the resulting data table, since you now have the paramaters as columns. After that you can use dcast to put the data back in wide format.

library(data.table)
setDT(mypar)
setDT(mydata)
setnames(mypar, 'name', 'variable')

long_out <- 
  merge(melt(mydata, 1), mypar, by = 'variable')[,
    fun_out := Map(fun, value, par1, par2, par3)]

dcast(long_out, index ~ variable, value.var = 'fun_out')
#    index   x   y        z
# 1:     1 1.4 6.3 2.833333
# 2:     2 3.4 8.3 4.833333
# 3:     3 3.4 8.3 8.833333

Data used

fun <- function(x, p1, p2, p3) x + p1/p2 -p3

mypar <- fread('
 name  par1  par2  par3
    x   0.1   0.2   0.1 
    y   0.5   0.1   0.7
    z   0.3   0.9   0.5
')

mydata <- fread('
index     x     y     z
    1     1     2     3
    2     3     4     5
    3     3     4     9
')

Upvotes: 3

Related Questions