Reputation: 2443
I have the following function which I would like to apply to a data frame's columns:
ff <- function(w, epsi, df) {
res <- w*(max(df, na.rm = T) - min(df, na.rm=T)+2*epsi)+min(df, na.rm = T) - epsi
return(res)
}
the function should be applied to each column of w
, using the parameters epsi
and df
. The function has to take each value of w
and multiply it by the max of the corresponding column of df
minus the min of that column of df
, and so on.
For example:
> w
# A tibble: 5 x 2
A B
<dbl> <dbl>
1 0.290 0.928
2 0.917 0.929
3 0.910 0.919
4 0.243 0.908
5 0.936 0.901
and df
is:
> df
# A tibble: 10 x 2
A B
<dbl> <dbl>
1 0.977 1.03
2 1.04 1.15
3 0.929 0.875
4 1.12 1.15
5 0.913 1.05
6 1.00 1.09
7 0.972 1.03
8 0.919 1.04
9 0.935 0.973
10 1.08 1.17
So for the first element of w
, 0.290, the function multiplies 0.290 by the max of df
in column A
(1.12), and so on.
How can I apply the function using dplyr
to w
?
I tried:
w_new = w %>% mutate_each(ff(w,0.001, df))
But instead of accepting only one column at a time, it tries to operate on all the columns at once.
Upvotes: 2
Views: 838
Reputation: 50668
Here is base R solution using mapply
mapply(
function(x, y, epsi = 0.001)
x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi,
w, df)
# A B
#[1,] 0.972610 1.149616
#[2,] 1.103653 1.149913
#[3,] 1.102190 1.146943
#[4,] 0.962787 1.143676
#[5,] 1.107624 1.141597
Explanation: mapply
applies the function to w
and df
column by column, and simplifies the result to a 5x2 matrix
.
Redefining ff
slightly, this can be written more succinctly as
ff <- function(x, y, epsi = 0.001)
x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi
mapply(ff, w, df)
or using purrr::map2_df
as
w %>% map2_df(df, ff)
## A tibble: 5 x 2
# A B
# <dbl> <dbl>
#1 0.973 1.15
#2 1.10 1.15
#3 1.10 1.15
#4 0.963 1.14
#5 1.11 1.14
Results from a very quick&dirty microbenchmark
analysis of the mutate_all
and map2
approaches look like this:
res <- microbenchmark(
map2 = {
ff <- function(x, y, epsi = 0.001)
x * (max(y, na.rm = T) - min(y, na.rm = T) + 2 * epsi) + min(y, na.rm = T) - epsi
w %>% map2_df(df, ff)
},
mutate_all = {
ff <- function(x, epsi, colName) {
res <- x*(max(df[,colName], na.rm = TRUE) - min(df[,colName], na.rm=TRUE) +
2*epsi)+min(df[,colName], na.rm = TRUE) - epsi
return(res)
}
w %>% mutate_all(funs(ff(., 0.001, quo_name(quo(.)) )))
}
)
res
#Unit: microseconds
# expr min lq mean median uq max neval
# map2 320.537 371.1365 495.7786 397.6755 449.4445 8599.661 100
# mutate_all 1916.788 1998.2105 2312.5878 2059.7650 2290.1415 11169.320 100
library(ggplot2)
autoplot(res)
Upvotes: 1
Reputation: 20085
OP has expressed desire to use dplyr
based solution for this problem hence I thought to provide an answer using dplyr
.
In the situations where we need to look for a corresponding column (name) in another table then its always better not to rely on order of columns ranter use specific column names. The quo_name
and quo
functions provides access to column name in scope of dplyr::mutate_all
function.
A solution using dplyr::mutate_all
can be as:
# Re-wirte fucntion to accept the column name for df
ff <- function(x, epsi, colName) {
res <- x*(max(df[,colName], na.rm = TRUE) - min(df[,colName], na.rm=TRUE) +
2*epsi)+min(df[,colName], na.rm = TRUE) - epsi
return(res)
}
library(dplyr)
# The corresponding column names is passed to ff using quo_name(quo(.))
w %>% mutate_all(funs(ff(., 0.001, quo_name(quo(.)) )))
# A B
# 1 0.972610 1.149616
# 2 1.103653 1.149913
# 3 1.102190 1.146943
# 4 0.962787 1.143676
# 5 1.107624 1.141597
Data:
w <- read.table(text =
"A B
1 0.290 0.928
2 0.917 0.929
3 0.910 0.919
4 0.243 0.908
5 0.936 0.901",
header = TRUE)
df <- read.table(text =
"A B
1 0.977 1.03
2 1.04 1.15
3 0.929 0.875
4 1.12 1.15
5 0.913 1.05
6 1.00 1.09
7 0.972 1.03
8 0.919 1.04
9 0.935 0.973
10 1.08 1.17",
header = TRUE)
Upvotes: 2