R group by and encode top 5 occurring rows

Question

I have the following dataframe:

ID | Item |
1  |  A   |
1  |  A   |
1  |  A   |
1  |  B   |
2  |  B   |
2  |  B   |
1  |  B   |
1  |  C   |
2  |  A   |
2  |  A   |
3  |  C   |
3  |  B   |
3  |  C   |
3  |  B   |
3  |  A   |
2  |  C   |

I would like to take the Top 2 most popular/occurring item of each ID and encode them. Desired result is:

ID | Item A | Item B | Item C |
1  |   1    |   1    |   0    |
2  |   1    |   1    |   0    |
3  |   0    |   1    |   1    |

As long as the item is in the Top 2 of an ID, it will get a count of 1. How can I go about in R? I'm thinking of group_by %>% top_n(n=2)...

www · Accepted Answer

Here is one idea. One thing to think about is what do you want to do if there are ties? Here I sort the columns and then slice for the first two. You may come up with another way to deal with ties.

library(tidyverse)

dat2 <- dat %>%
  count(ID, Item) %>%
  arrange(ID, desc(n), Item) %>%
  group_by(ID) %>%
  slice(1:2) %>%
  mutate(n = 1) %>%
  pivot_wider(names_from = Item, values_from = n, values_fill = list(n = 0)) %>%
  ungroup()
dat2
# # A tibble: 3 x 4
#   ID     A     B     C
#      
# 1     1     1     1     0
# 2     2     1     1     0
# 3     3     0     1     1

DATA

dat <- read.table(text = "ID Item
1  A
1  A
1  A
1  B
2  B
2  B
1  B
1  C
2  A
2  A
3  C
3  B
3  C
3  B
3  A
2  C",
                  header = TRUE, stringsAsFactors = FALSE)

R group by and encode top 5 occurring rows

Answers (2)

Related Questions