TYL
TYL

Reputation: 1637

Applying function using dplyr and setting output as columns in dataframe

I have a huge dataframe and I am applying a function that has multiple outputs on one column and would like to add these outputs as columns in the dataframe.

Example function:

measure <- function(x){ # useless function for illustrative purposes
     one <- x+1
     two <- x^2
     three <- x/2
     m <- c(one,two,three)
     names(m) <- c('Plus1','Square','Half')
     return(m)
 } 

My current method which is very inefficient:

a <- mtcars %>% group_by(cyl) %>% mutate(Plus1 = measure(wt)[1], Square = measure(wt)[2], 
       Half = measure(wt)[3]) %>% as.data.frame()

Output:

head(a,15)

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb Plus1 Square  Half
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  3.62  3.875 4.215
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  3.62  3.875 4.215
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  3.32  4.190 4.150
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  3.62  3.875 4.215
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  4.44  4.570 5.070
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  3.62  3.875 4.215
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  4.44  4.570 5.070
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  3.32  4.190 4.150
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  3.32  4.190 4.150
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4  3.62  3.875 4.215
11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4  3.62  3.875 4.215
12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3  4.44  4.570 5.070
13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3  4.44  4.570 5.070
14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3  4.44  4.570 5.070
15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  4.44  4.570 5.070

Is there any more efficient way to do this? My actual function has 13 outputs and it is taking very long to apply to my large dataframe. Please help!

Upvotes: 0

Views: 48

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

There could be various ways to solve this however, one option is to return a tibble output from the function, split the dataframe based on group, calculate the statistics for each and bind the result together.

library(tidyverse)

measure <- function(x){
   tibble(Plus1 = x+1,Square = x^2,Half = x/2) 
}


bind_cols(mtcars %>% arrange(cyl), 
          mtcars %>% 
              group_split(cyl) %>% 
              map_df(~measure(.$wt)))

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb Plus1    Square   Half
#1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 3.320  5.382400 1.1600
#2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 4.190 10.176100 1.5950
#3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 4.150  9.922500 1.5750
#4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 3.200  4.840000 1.1000
#5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 2.615  2.608225 0.8075
#6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 2.835  3.367225 0.9175
#....

This calls measure only once per group irrespective of number of values returned unlike in the attempt it was called n times to extract n values.

Upvotes: 1

Related Questions