firmo23
firmo23

Reputation: 8464

Summarize and count unique values of a column based on the values of another column in a dataframe

I have the dataframe below

name<-c("Jack","Bob","Jack","Bill","Jack","Bob")
items<-c("car","house","ball","desk","bike","chair")
d<-data.frame(name,item)

name  items
1 Jack   car
2  Bob house
3 Jack  ball
4 Bill  desk
5 Jack  bike
6  Bob chair

and I want to convert it to a dataframe in which the unique items will be summarized based on the name and a new column will be added with their count, so it will be like:

name         items count
1 Jack car,ball,bike     3
2  Bob   house,chair     2
3 Bill         chair     1

Upvotes: 0

Views: 56

Answers (3)

MarBlo
MarBlo

Reputation: 4534

d %>% group_by(name) %>% 
  mutate(foo = paste0(items, collapse = ",")) %>% 
  mutate(count_w = length(foo)) %>% 
  dplyr::select(-items) %>% 
  distinct()

# A tibble: 3 x 3
# Groups:   name [3]
  name foo   count_w
  <chr> <chr>          <int>
1 Jack  car,ball,bike      3
2 Bob   house,chair        2
3 Bill  desk               1

Upvotes: 1

Jan
Jan

Reputation: 5284

A base R solution without extra packages

d <-data.frame(name  = c("Jack","Bob","Jack","Bill","Jack","Bob"),
               items = c("car","house","ball","desk","bike","chair"))

Get the frequencies for names and add another column for the concatenated items.

result <- margin.table(table(d), 1)
sdf <- data.frame(items = paste(d$items[d$name == names(result)], collapse = ", "), result)

reorder columns

sdf <- sdf[, c(2, 1, 3)]
sdf
#>   name             items Freq
#> 1 Bill house, ball, desk    1
#> 2  Bob house, ball, desk    2
#> 3 Jack house, ball, desk    3

Created on 2021-01-10 by the reprex package (v0.3.0)

Upvotes: 1

latlio
latlio

Reputation: 1587

Although I'm not convinced that a comma-separated sequence is the best way to approach further data processing, here is code that does what you want:

library(dplyr)

d %>%
  group_by(name) %>% 
  summarize(count = n(),
            items = toString(items)) %>%
  ungroup()

# A tibble: 3 x 3
  name  count items          
  <chr> <int> <chr>          
1 Bill      1 desk           
2 Bob       2 house, chair   
3 Jack      3 car, ball, bike

Upvotes: 2

Related Questions