Reputation: 939
I have the following dataframe:
ID | Item |
1 | A |
1 | A |
1 | A |
1 | B |
2 | B |
2 | B |
1 | B |
1 | C |
2 | A |
2 | A |
3 | C |
3 | B |
3 | C |
3 | B |
3 | A |
2 | C |
I would like to take the Top 2 most popular/occurring item of each ID and encode them. Desired result is:
ID | Item A | Item B | Item C |
1 | 1 | 1 | 0 |
2 | 1 | 1 | 0 |
3 | 0 | 1 | 1 |
As long as the item is in the Top 2 of an ID, it will get a count of 1. How can I go about in R? I'm thinking of group_by %>% top_n(n=2)...
Upvotes: 1
Views: 78
Reputation: 35564
library(tidyverse)
df %>%
group_by(ID) %>%
count(Item) %>%
top_n(2, n) %>%
ungroup() %>%
pivot_wider(names_from = Item, values_from = n,
values_fn = list(n = ~ 1),
values_fill = list(n = 0))
# # A tibble: 3 x 4
# ID A B C
# <int> <dbl> <dbl> <dbl>
# 1 1 1 1 0
# 2 2 1 1 0
# 3 3 0 1 1
Description
values_fn = list(n = ~ 1)
: convert count numbers to 1
( It's equivalent to add mutate(n = 1)
before pivot_wider
)
values_fill = list(n = 0)
: specify 0 to be filled when missing
Upvotes: 2
Reputation: 39154
Here is one idea. One thing to think about is what do you want to do if there are ties? Here I sort the columns and then slice
for the first two. You may come up with another way to deal with ties.
library(tidyverse)
dat2 <- dat %>%
count(ID, Item) %>%
arrange(ID, desc(n), Item) %>%
group_by(ID) %>%
slice(1:2) %>%
mutate(n = 1) %>%
pivot_wider(names_from = Item, values_from = n, values_fill = list(n = 0)) %>%
ungroup()
dat2
# # A tibble: 3 x 4
# ID A B C
# <int> <dbl> <dbl> <dbl>
# 1 1 1 1 0
# 2 2 1 1 0
# 3 3 0 1 1
DATA
dat <- read.table(text = "ID Item
1 A
1 A
1 A
1 B
2 B
2 B
1 B
1 C
2 A
2 A
3 C
3 B
3 C
3 B
3 A
2 C",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2