Reputation: 1637
I have a huge dataframe and I am applying a function that has multiple outputs on one column and would like to add these outputs as columns in the dataframe.
Example function:
measure <- function(x){ # useless function for illustrative purposes
one <- x+1
two <- x^2
three <- x/2
m <- c(one,two,three)
names(m) <- c('Plus1','Square','Half')
return(m)
}
My current method which is very inefficient:
a <- mtcars %>% group_by(cyl) %>% mutate(Plus1 = measure(wt)[1], Square = measure(wt)[2],
Half = measure(wt)[3]) %>% as.data.frame()
Output:
head(a,15)
mpg cyl disp hp drat wt qsec vs am gear carb Plus1 Square Half
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3.62 3.875 4.215
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 3.62 3.875 4.215
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3.32 4.190 4.150
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 3.62 3.875 4.215
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 4.44 4.570 5.070
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3.62 3.875 4.215
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 4.44 4.570 5.070
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 3.32 4.190 4.150
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 3.32 4.190 4.150
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 3.62 3.875 4.215
11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 3.62 3.875 4.215
12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 4.44 4.570 5.070
13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 4.44 4.570 5.070
14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 4.44 4.570 5.070
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 4.44 4.570 5.070
Is there any more efficient way to do this? My actual function has 13 outputs and it is taking very long to apply to my large dataframe. Please help!
Upvotes: 0
Views: 48
Reputation: 388982
There could be various ways to solve this however, one option is to return a tibble
output from the function, split
the dataframe based on group, calculate the statistics for each and bind the result together.
library(tidyverse)
measure <- function(x){
tibble(Plus1 = x+1,Square = x^2,Half = x/2)
}
bind_cols(mtcars %>% arrange(cyl),
mtcars %>%
group_split(cyl) %>%
map_df(~measure(.$wt)))
# mpg cyl disp hp drat wt qsec vs am gear carb Plus1 Square Half
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 3.320 5.382400 1.1600
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 4.190 10.176100 1.5950
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 4.150 9.922500 1.5750
#4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 3.200 4.840000 1.1000
#5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 2.615 2.608225 0.8075
#6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 2.835 3.367225 0.9175
#....
This calls measure
only once per group irrespective of number of values returned unlike in the attempt it was called n
times to extract n
values.
Upvotes: 1