ant
ant

Reputation: 585

Collapsing rows by user with dplyr

I want to collapse the rows based on users while placing the '1' on their corresponding columns.

Each row for each user can only have one '1' so there need not be any adding to the rows following.

My df:

User  +1  +2  +3  +4  +5
   A   1   0   0   0   0
   A   0   1   0   0   0
   A   0   0   0   0   1
   B   0   0   1   0   0 
   B   0   0   0   1   0

Expected result:

User  +1  +2  +3  +4  +5
   A   1   1   0   0   1
   B   0   0   1   1   0 

Any help would be appreciated.

Upvotes: 12

Views: 21297

Answers (3)

talat
talat

Reputation: 70256

Looks like you can use summarise_each:

df %>% group_by(User) %>% summarise_all(funs(sum))

Edit note: replaced summarise_each which is now deprecated with summarise_all

Upvotes: 23

user295691
user295691

Reputation: 7248

The way that I would approach this would be to convert your data to long form first, then do the aggregation, and convert back out to wide form if necessary for display purposes.

So, using tidyr,

df %>%
  pivot_longer(cols=-User, names_to="rating", values_to="count") %>%
  group_by(User, rating) %>%
  summarise(count = max(count)) %>% 
  spread(rating, count)

The first gather converts to long form (using p instead of +):

> df <- read.table(header=TRUE, text='User  p1  p2  p3  p4  p5
   A   1   0   0   0   0
   A   0   1   0   0   0
   A   0   0   0   0   1
   B   0   0   1   0   0 
   B   0   0   0   1   0
')
> df %>% pivot_longer(
     cols=-User, names_to="rating", values_to="count")
# A tibble: 25 × 3
   User  rating count
   <chr> <chr>  <int>
 1 A     p1         1
 2 A     p2         0
 3 A     p3         0
 4 A     p4         0
 5 A     p5         0
 6 A     p1         0
 7 A     p2         1
 8 A     p3         0
 9 A     p4         0
10 A     p5         0
# … with 15 more rows

And the remaining steps perform the aggregation, then transform back to wide format.

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92282

Here's alternatve dplyr solution

df %>% group_by(User) %>% do(as.list(colSums(.)))

Or a data.table possible implementation

library(data.table)
setDT(df)[, lapply(.SD, sum), User]

Or

setDT(df)[, as.list(colSums(.SD)), User]

Or with base R, even simpler

aggregate(. ~ User, df, sum)

Upvotes: 7

Related Questions