Reputation: 5
I have a data frame in R that looks like this:
Id group category number
001 1 A 0.10
001 1 B 0.15
002 2 A 0.55
003 3 A 0.75
003 3 B 0.45
Now, I would like to have only one row per Id
. For Id's in groups
1 and 2, the row which category
is B should primarily be used. If there for groups
1 or 2 are no rows where the category
is B, then category
A should be used. For Id's
which group
is 3, the row where the category
is A should always be used.
The output should look like this
Id group category number
001 1 B 0.15
002 2 A 0.55
003 3 A 0.75
How could this be done in R?
Upvotes: 0
Views: 102
Reputation: 146090
Since B
comes after A
, we'll sort by category descending and keep one row per group, filtering out the group 3 / category A rows per your suggestion.
library(dplyr)
your_data %>%
filter(!(group == 3 & category == "A")) %>%
group_by(Id, group) %>%
arrange(desc(category)) %>%
slice(1)
Upvotes: 0
Reputation: 887741
We could use slice
library(dplyr)
df1 %>%
group_by(Id) %>%
slice(max(match('B', category, nomatch = 0), 1))
df1 <- structure(list(Id = c("001", "001", "002", "003", "003"), group = c(1L,
1L, 2L, 3L, 3L), category = c("A", "B", "A", "A", "B"), number = c(0.1,
0.15, 0.55, 0.75, 0.45)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 1