Reputation: 846
I have a data:
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
The follow function work:
library(tidyverse)
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(c(x.1, x.3)))
But, the follows functions (for all variables) dooesn't work:
with .
:
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(.))
with select_if
:
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(select_if(., is.numeric)))
The both methods return:
Source: local data frame [30 x 5]
Groups: <by row>
# A tibble: 30 x 5
x.1 x.2 x.3 x.4 var
<dbl> <dbl> <dbl> <dbl> <dbl>
1 32.7 42.7 50.1 20.8 7091.
2 75.9 71.3 83.6 77.6 7091.
3 49.6 28.7 97.0 59.7 7091.
4 47.4 96.1 31.9 79.7 7091.
5 54.2 47.1 81.7 41.6 7091.
6 27.9 58.1 97.4 25.9 7091.
7 61.8 78.3 52.6 67.7 7091.
8 85.4 51.3 38.8 82.0 7091.
9 27.9 72.6 68.9 25.2 7091.
10 87.2 42.1 27.6 73.9 7091.
# ... with 20 more rows
Where 7091
is a incorrect sum.
How adjustment this functions?
Upvotes: 2
Views: 1140
Reputation: 10921
This is a tricky problem since dplyr operates column-wise for many operations. I originally used apply
from base R to apply over rows, but apply
is problematic when handling character and numeric types.
Instead we can use (the aging) plyr and adply
to do this simply, since plyr lets us treat a one-row data frame as a vector:
df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))
Note some functions like var
won't work on a one-row data frame so we need to convert to vector using as.numeric
.
Upvotes: 1
Reputation: 13691
This can be done using purrr::pmap
, which passes a list of arguments to a function that accepts "dots". Since most functions like mean
, sd
, etc. work with vectors, you need to pair the call with a domain lifter:
df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 68.48282
# 2 49.40462 47.00752 21.99248 78.87789 49.32063
df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 13.88555
# 2 49.40462 47.00752 21.99248 78.87789 23.27958
The function sum
accepts dots directly, so you don't need to lift its domain:
df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 273.9313
# 2 49.40462 47.00752 21.99248 78.87789 197.2825
Everything conforms to the standard dplyr
data processing, so all three can be combined as separate arguments to mutate
:
df_1 %>% select(-y) %>%
mutate( v1 = pmap(., lift_vd(mean)),
v2 = pmap(., lift_vd(sd)),
v3 = pmap(., sum) )
# x.1 x.2 x.3 x.4 v1 v2 v3
# 1 70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2 49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825
Upvotes: 4
Reputation: 5405
A few approaches I've taken in the past:
rowSums
)reduce
(which doesn't apply to all functions)pmap
set.seed(1)
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
library(tidyverse)
# rowSums
df_1 %>%
mutate(var = rowSums(select(., -y))) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746
df_1 %>%
mutate(var = reduce(select(., -y),`+`)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746
df_1 %>%
mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785
pmap
my_var <- function(...){
vec <- c(...)
var(vec)
}
df_1 %>%
mutate(var = select(., -y) %>% pmap(my_var)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679
Created on 2019-04-30 by the reprex package (v0.2.1)
Upvotes: 2
Reputation: 1757
I think this is tricky because the scoped variants of mutate (mutate_at
, mutate_all
, mutate_if
) are generally aimed at executing a function on a specific column, instead of creating an operation that uses all columns.
The simplest solution I can come up with basically amounts to creating a vector (cols
) that is then used to execute the summary operation:
library(dplyr)
library(purrr)
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]
cols
#> [1] "x.1" "x.2" "x.3" "x.4"
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(
var = sum(!!!map(cols, as.name), na.rm = TRUE)
)
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#>
#> # A tibble: 30 x 5
#> x.1 x.2 x.3 x.4 var
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 46.1 28.9 28.9 50.7 155.
#> 2 26.8 68.0 67.1 26.5 188.
#> 3 35.2 63.8 62.5 28.5 190.
#> 4 31.3 44.9 67.3 68.2 212.
#> 5 52.6 23.9 83.2 43.4 203.
#> 6 55.7 92.8 86.3 57.2 292.
#> 7 56.9 50.0 77.6 25.6 210.
#> 8 95.0 82.6 86.1 22.7 286.
#> 9 62.7 26.5 61.0 88.9 239.
#> 10 65.2 23.1 25.5 51.0 165.
#> # … with 20 more rows
Created on 2019-04-30 by the reprex package (v0.2.1)
NOTE: if you are unfamiliar with purrr
, you can also use something like lapply
, etc.
You can read more about these types of more tricky dplyr
operations (!!
, !!!
, etc.) here:
https://dplyr.tidyverse.org/articles/programming.html
Upvotes: 2