Omry Atia
Omry Atia

Reputation: 2443

dplyr's mutate at each column separately with a custom function of several parametrs

I have the following function which I would like to apply to a data frame's columns:

ff <- function(w, epsi, df) {
  res <- w*(max(df, na.rm = T) - min(df, na.rm=T)+2*epsi)+min(df, na.rm = T) - epsi
  return(res)
}

the function should be applied to each column of w, using the parameters epsi and df. The function has to take each value of w and multiply it by the max of the corresponding column of df minus the min of that column of df, and so on.

For example:

> w
# A tibble: 5 x 2
    A       B
  <dbl>   <dbl>
1 0.290  0.928  
2 0.917  0.929  
3 0.910  0.919  
4 0.243  0.908  
5 0.936  0.901  

and df is:

> df
# A tibble: 10 x 2
   A     B
  <dbl> <dbl>
1 0.977 1.03 
2 1.04  1.15 
3 0.929 0.875
4 1.12  1.15 
5 0.913 1.05 
6 1.00  1.09 
7 0.972 1.03 
8 0.919 1.04 
9 0.935 0.973
10 1.08  1.17 

So for the first element of w, 0.290, the function multiplies 0.290 by the max of df in column A (1.12), and so on.

How can I apply the function using dplyr to w?

I tried:

w_new = w %>%  mutate_each(ff(w,0.001, df))

But instead of accepting only one column at a time, it tries to operate on all the columns at once.

Upvotes: 2

Views: 838

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50668

Here is base R solution using mapply

mapply(
    function(x, y, epsi = 0.001)
        x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi,
    w, df)
#            A        B
#[1,] 0.972610 1.149616
#[2,] 1.103653 1.149913
#[3,] 1.102190 1.146943
#[4,] 0.962787 1.143676
#[5,] 1.107624 1.141597

Explanation: mapply applies the function to w and df column by column, and simplifies the result to a 5x2 matrix.

Redefining ff slightly, this can be written more succinctly as

ff <- function(x, y, epsi = 0.001)
    x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi
mapply(ff, w, df)

or using purrr::map2_df as

w %>% map2_df(df, ff)
## A tibble: 5 x 2
#      A     B
#  <dbl> <dbl>
#1 0.973  1.15
#2 1.10   1.15
#3 1.10   1.15
#4 0.963  1.14
#5 1.11   1.14

Update

Results from a very quick&dirty microbenchmark analysis of the mutate_all and map2 approaches look like this:

res <- microbenchmark(
    map2 = {
        ff <- function(x, y, epsi = 0.001)
            x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi
        w %>% map2_df(df, ff)
    },
    mutate_all = {
        ff <- function(x, epsi, colName) {
            res <- x*(max(df[,colName], na.rm = TRUE) - min(df[,colName], na.rm=TRUE) +
         2*epsi)+min(df[,colName], na.rm = TRUE) - epsi
            return(res)
        }
        w %>% mutate_all(funs(ff(., 0.001, quo_name(quo(.)) )))
    }
)
res
#Unit: microseconds
#       expr      min        lq      mean    median        uq       max neval
#       map2  320.537  371.1365  495.7786  397.6755  449.4445  8599.661   100
# mutate_all 1916.788 1998.2105 2312.5878 2059.7650 2290.1415 11169.320   100

library(ggplot2)
autoplot(res)

enter image description here

Upvotes: 1

MKR
MKR

Reputation: 20085

OP has expressed desire to use dplyr based solution for this problem hence I thought to provide an answer using dplyr.

In the situations where we need to look for a corresponding column (name) in another table then its always better not to rely on order of columns ranter use specific column names. The quo_name and quo functions provides access to column name in scope of dplyr::mutate_all function.

A solution using dplyr::mutate_all can be as:

# Re-wirte fucntion to accept the column name for df
ff <- function(x, epsi, colName) {
  res <- x*(max(df[,colName], na.rm = TRUE) - min(df[,colName], na.rm=TRUE) +
         2*epsi)+min(df[,colName], na.rm = TRUE) - epsi
  return(res)
}

library(dplyr)

# The corresponding column names is passed to ff using quo_name(quo(.))
w %>% mutate_all(funs(ff(., 0.001, quo_name(quo(.)) )))
#          A        B
# 1 0.972610 1.149616
# 2 1.103653 1.149913
# 3 1.102190 1.146943
# 4 0.962787 1.143676
# 5 1.107624 1.141597

Data:

w <- read.table(text = 
"A       B
1 0.290  0.928  
2 0.917  0.929  
3 0.910  0.919  
4 0.243  0.908  
5 0.936  0.901",
header = TRUE)


df <- read.table(text = 
"A     B
1 0.977 1.03 
2 1.04  1.15 
3 0.929 0.875
4 1.12  1.15 
5 0.913 1.05 
6 1.00  1.09 
7 0.972 1.03 
8 0.919 1.04 
9 0.935 0.973
10 1.08  1.17",
header = TRUE)

Upvotes: 2

Related Questions