spidermarn
spidermarn

Reputation: 939

R group by and encode top 5 occurring rows

I have the following dataframe:

ID | Item |
1  |  A   |
1  |  A   |
1  |  A   |
1  |  B   |
2  |  B   |
2  |  B   |
1  |  B   |
1  |  C   |
2  |  A   |
2  |  A   |
3  |  C   |
3  |  B   |
3  |  C   |
3  |  B   |
3  |  A   |
2  |  C   |

I would like to take the Top 2 most popular/occurring item of each ID and encode them. Desired result is:

ID | Item A | Item B | Item C |
1  |   1    |   1    |   0    |
2  |   1    |   1    |   0    |
3  |   0    |   1    |   1    |

As long as the item is in the Top 2 of an ID, it will get a count of 1. How can I go about in R? I'm thinking of group_by %>% top_n(n=2)...

Upvotes: 1

Views: 78

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35564

library(tidyverse)

df %>%
  group_by(ID) %>% 
  count(Item) %>% 
  top_n(2, n) %>%
  ungroup() %>%
  pivot_wider(names_from = Item, values_from = n,
              values_fn = list(n = ~ 1),
              values_fill = list(n = 0))

# # A tibble: 3 x 4
#      ID     A     B     C
#   <int> <dbl> <dbl> <dbl>
# 1     1     1     1     0
# 2     2     1     1     0
# 3     3     0     1     1

Description

  • values_fn = list(n = ~ 1): convert count numbers to 1
    ( It's equivalent to add mutate(n = 1) before pivot_wider)

  • values_fill = list(n = 0): specify 0 to be filled when missing

Upvotes: 2

www
www

Reputation: 39154

Here is one idea. One thing to think about is what do you want to do if there are ties? Here I sort the columns and then slice for the first two. You may come up with another way to deal with ties.

library(tidyverse)

dat2 <- dat %>%
  count(ID, Item) %>%
  arrange(ID, desc(n), Item) %>%
  group_by(ID) %>%
  slice(1:2) %>%
  mutate(n = 1) %>%
  pivot_wider(names_from = Item, values_from = n, values_fill = list(n = 0)) %>%
  ungroup()
dat2
# # A tibble: 3 x 4
#   ID     A     B     C
#   <int> <dbl> <dbl> <dbl>
# 1     1     1     1     0
# 2     2     1     1     0
# 3     3     0     1     1

DATA

dat <- read.table(text = "ID Item
1  A
1  A
1  A
1  B
2  B
2  B
1  B
1  C
2  A
2  A
3  C
3  B
3  C
3  B
3  A
2  C",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

Related Questions