C_Z_
C_Z_

Reputation: 7816

Count number of values in row using dplyr

This question should have a simple, elegant solution but I can't figure it out, so here it goes:

Let's say I have the following dataset and I want to count the number of 2s present in each row using dplyr.

set.seed(1)
ID <- LETTERS[1:5]
X1 <- sample(1:5, 5,T)
X2 <- sample(1:5, 5,T)
X3 <- sample(1:5, 5,T)

df <- data.frame(ID,X1,X2,X3)
library(dplyr)

Now, the following works:

df %>%
  rowwise %>%
  mutate(numtwos = sum(c(X1,X2,X3) == 2))

But how do I avoid typing out all of the column names?

I know this is probably easier to do without dplyr, but more generally I want to know how I can use dplyr's mutate with multiple columns without typing out all the column names.

Upvotes: 7

Views: 7270

Answers (6)

SeanM
SeanM

Reputation: 135

For dplyr > 1.0.0, below is a quick solution using across() to specific column ranges to operate on:

df %>% 
   mutate(numtwos = rowSums(across(X1:X3, ~ .x == 2)))

     ID X1 X2 X3 numtwos
  1  A  2  5  2       2
  2  B  2  5  1       1
  3  C  3  4  4       0
  4  D  5  4  2       1
  5  E  2  1  4       1 

Upvotes: 0

broesel23
broesel23

Reputation: 117

Just wanted to add to the answer of @evan.oman in case you only want to sum rows for specific columns, not all of them. You can use the regular select and/or select_helpers functions. In this example, we don't want to include X1 in rowSums:

df %>% 
  mutate(numtwos = rowSums(select(., -X1) == 2))

  ID X1 X2 X3 numtwos
1  A  2  5  2       1
2  B  2  5  1       0
3  C  3  4  4       0
4  D  5  4  2       1
5  E  2  1  4       0

Upvotes: 6

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21641

Here's another alternative using purrr:

library(purrr)

df %>%
  by_row(function(x) {
    sum(x[-1] == 2) },
    .to = "numtwos",
    .collate = "cols"
  )

Which gives:

#Source: local data frame [5 x 5]
#
#      ID    X1    X2    X3 numtwos
#  <fctr> <int> <int> <int>   <int>
#1      A     2     5     2       2
#2      B     2     5     1       1
#3      C     3     4     4       0
#4      D     5     4     2       1
#5      E     2     1     4       1

As per mentioned in the NEWS, row based functionals are still maturing in dplyr:

We are still figuring out what belongs in dplyr and what belongs in purrr. Expect much experimentation and many changes with these functions.


Benchmark

We can see how rowwise() and do() compare to purrr::by_row() for this type of problem and how they "perform" against rowSums() and the tidy data way:

largedf <-  df[rep(seq_len(nrow(df)), 10e3), ]

library(microbenchmark)
microbenchmark(
  steven = largedf %>% 
    by_row(function(x) { 
      sum(x[-1] == 2) }, 
      .to = "numtwos", 
      .collate = "cols"),
  psidom = largedf %>% 
    rowwise %>% 
    do(data_frame(numtwos = sum(.[-1] == 2))) %>% 
    cbind(largedf, .),
  gopala = largedf %>% 
    gather(key, value, -ID) %>% 
    group_by(ID) %>% 
    summarise(numtwos = sum(value == 2)) %>% 
    inner_join(largedf, .),
  evan   = largedf %>% 
    mutate(numtwos = rowSums(. == 2)),
  times  = 10L,
  unit   = "relative"
)

Results:

#Unit: relative
#   expr         min          lq        mean      median         uq         max neval cld
# steven 1225.190659 1261.466936 1267.737126 1227.762573 1276.07977 1339.841636    10  b 
# psidom 3677.603240 3759.402212 3726.891458 3678.717170 3728.78828 3777.425492    10   c
# gopala    2.715005    2.684599    2.638425    2.612631    2.59827    2.572972    10 a  
#   evan    1.000000    1.000000    1.000000    1.000000    1.00000    1.000000    10 a  

Upvotes: 6

evan.oman
evan.oman

Reputation: 5572

Try rowSums:

> set.seed(1)
> ID <- LETTERS[1:5]
> X1 <- sample(1:5, 5,T)
> X2 <- sample(1:5, 5,T)
> X3 <- sample(1:5, 5,T)
> df <- data.frame(ID,X1,X2,X3)
> df
  ID X1 X2 X3
1  A  2  5  2
2  B  2  5  1
3  C  3  4  4
4  D  5  4  2
5  E  2  1  4
> rowSums(df == 2)
[1] 2 1 0 1 1

Alternatively, with dplyr:

> df %>% mutate(numtwos = rowSums(. == 2))
  ID X1 X2 X3 numtwos
1  A  2  5  2       2
2  B  2  5  1       1
3  C  3  4  4       0
4  D  5  4  2       1
5  E  2  1  4       1

Upvotes: 15

akuiper
akuiper

Reputation: 215117

You can use do, which doesn't add the column to your original data frame and you need to add the column to your original data frame.

df %>%
    rowwise %>%
    do(numtwos = sum(.[-1] == 2)) %>% 
    data.frame
  numtwos
1       2
2       1
3       0
4       1
5       1

Add a cbind to bind the new column to the original data frame:

df %>%
     rowwise %>%
     do(numtwos = sum(.[-1] == 2)) %>% 
     data.frame %>% cbind(df, .)

  ID X1 X2 X3 numtwos
1  A  2  5  2       2
2  B  2  5  1       1
3  C  3  4  4       0
4  D  5  4  2       1
5  E  2  1  4       1 

Upvotes: 1

Gopala
Gopala

Reputation: 10483

One approach is to use a combination of dplyr and tidyr to convert data into long format, and do the computation:

library(dplyr)
library(tidyr)
df %>%
  gather(key, value, -ID) %>%
  group_by(ID) %>%
  summarise(numtwos = sum(value == 2)) %>%
  inner_join(df, .)

Output is as follows:

  ID X1 X2 X3 numtwos
1  A  2  5  2       2
2  B  2  5  1       1
3  C  3  4  4       0
4  D  5  4  2       1
5  E  2  1  4       1

Upvotes: 2

Related Questions