neves
neves

Reputation: 846

Apply `dplyr::rowwise` in all variables

I have a data:

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

The follow function work:

library(tidyverse)

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(c(x.1, x.3)))

But, the follows functions (for all variables) dooesn't work:

with .:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(.))

with select_if:

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(var = sum(select_if(., is.numeric)))

The both methods return:

Source: local data frame [30 x 5]
Groups: <by row>

# A tibble: 30 x 5
     x.1   x.2   x.3   x.4   var
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  32.7  42.7  50.1  20.8 7091.
 2  75.9  71.3  83.6  77.6 7091.
 3  49.6  28.7  97.0  59.7 7091.
 4  47.4  96.1  31.9  79.7 7091.
 5  54.2  47.1  81.7  41.6 7091.
 6  27.9  58.1  97.4  25.9 7091.
 7  61.8  78.3  52.6  67.7 7091.
 8  85.4  51.3  38.8  82.0 7091.
 9  27.9  72.6  68.9  25.2 7091.
10  87.2  42.1  27.6  73.9 7091.
# ... with 20 more rows

Where 7091 is a incorrect sum.

How adjustment this functions?

Upvotes: 2

Views: 1140

Answers (4)

qwr
qwr

Reputation: 10921

This is a tricky problem since dplyr operates column-wise for many operations. I originally used apply from base R to apply over rows, but apply is problematic when handling character and numeric types.

Instead we can use (the aging) plyr and adply to do this simply, since plyr lets us treat a one-row data frame as a vector:

df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))

Note some functions like var won't work on a one-row data frame so we need to convert to vector using as.numeric.

Upvotes: 1

Artem Sokolov
Artem Sokolov

Reputation: 13691

This can be done using purrr::pmap, which passes a list of arguments to a function that accepts "dots". Since most functions like mean, sd, etc. work with vectors, you need to pair the call with a domain lifter:

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 68.48282
# 2  49.40462 47.00752 21.99248 78.87789 49.32063

df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 13.88555
# 2  49.40462 47.00752 21.99248 78.87789 23.27958

The function sum accepts dots directly, so you don't need to lift its domain:

df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
#         x.1      x.2      x.3      x.4      var
# 1  70.12072 62.99024 54.00672 86.81358 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 197.2825

Everything conforms to the standard dplyr data processing, so all three can be combined as separate arguments to mutate:

df_1 %>% select(-y) %>% 
  mutate( v1 = pmap(., lift_vd(mean)),
          v2 = pmap(., lift_vd(sd)),
          v3 = pmap(., sum) )
#         x.1      x.2      x.3      x.4       v1       v2       v3
# 1  70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2  49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825

Upvotes: 4

zack
zack

Reputation: 5405

A few approaches I've taken in the past:

  • use a pre-existing row-wise function (e.g. rowSums)
  • using reduce (which doesn't apply to all functions)
  • complicated transposing
  • custom function with pmap

Using pre-existing row-wise functions

set.seed(1)
df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

library(tidyverse)

# rowSums
df_1 %>%
  mutate(var = rowSums(select(., -y))) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

Using Reduce

df_1 %>%
  mutate(var = reduce(select(., -y),`+`))  %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746

ugly transposing and matrix / data.frame conversion

df_1 %>%
  mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y       var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2  43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785

Custom function with pmap

my_var <- function(...){
  vec <-  c(...)
  var(vec)
}

df_1 %>%
  mutate(var = select(., -y) %>% pmap(my_var)) %>%
  head()
#>        x.1      x.2      x.3      x.4 y      var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679

Created on 2019-04-30 by the reprex package (v0.2.1)

Upvotes: 2

cole
cole

Reputation: 1757

I think this is tricky because the scoped variants of mutate (mutate_at, mutate_all, mutate_if) are generally aimed at executing a function on a specific column, instead of creating an operation that uses all columns.

The simplest solution I can come up with basically amounts to creating a vector (cols) that is then used to execute the summary operation:

library(dplyr)
library(purrr)

df_1 <- data.frame(
  x = replicate(4, runif(30, 20, 100)), 
  y = sample(1:3, 30, replace = TRUE)
)

# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]

cols
#> [1] "x.1" "x.2" "x.3" "x.4"

df_1 %>% 
  select(-y) %>% 
  rowwise() %>% 
  mutate(
    var = sum(!!!map(cols, as.name), na.rm = TRUE)
  )
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 30 x 5
#>      x.1   x.2   x.3   x.4   var
#>    <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  46.1  28.9  28.9  50.7  155.
#>  2  26.8  68.0  67.1  26.5  188.
#>  3  35.2  63.8  62.5  28.5  190.
#>  4  31.3  44.9  67.3  68.2  212.
#>  5  52.6  23.9  83.2  43.4  203.
#>  6  55.7  92.8  86.3  57.2  292.
#>  7  56.9  50.0  77.6  25.6  210.
#>  8  95.0  82.6  86.1  22.7  286.
#>  9  62.7  26.5  61.0  88.9  239.
#> 10  65.2  23.1  25.5  51.0  165.
#> # … with 20 more rows

Created on 2019-04-30 by the reprex package (v0.2.1)

NOTE: if you are unfamiliar with purrr, you can also use something like lapply, etc.

You can read more about these types of more tricky dplyr operations (!!, !!!, etc.) here:

https://dplyr.tidyverse.org/articles/programming.html

Upvotes: 2

Related Questions