Falcc
Falcc

Reputation: 5

Conditional subset in data frame in R

I have a data frame in R that looks like this:

Id   group   category number
001  1       A        0.10
001  1       B        0.15
002  2       A        0.55
003  3       A        0.75
003  3       B        0.45

Now, I would like to have only one row per Id. For Id's in groups 1 and 2, the row which category is B should primarily be used. If there for groups 1 or 2 are no rows where the category is B, then category A should be used. For Id's which group is 3, the row where the category is A should always be used.

The output should look like this

Id   group   category number
001  1       B        0.15
002  2       A        0.55
003  3       A        0.75

How could this be done in R?

Upvotes: 0

Views: 102

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146090

Since B comes after A, we'll sort by category descending and keep one row per group, filtering out the group 3 / category A rows per your suggestion.

library(dplyr) 
your_data %>%
  filter(!(group == 3 & category == "A")) %>%
  group_by(Id, group) %>%
  arrange(desc(category)) %>%
  slice(1)

Upvotes: 0

akrun
akrun

Reputation: 887741

We could use slice

library(dplyr)
df1 %>% 
   group_by(Id) %>%
   slice(max(match('B', category, nomatch = 0), 1))

data

df1 <- structure(list(Id = c("001", "001", "002", "003", "003"), group = c(1L, 
1L, 2L, 3L, 3L), category = c("A", "B", "A", "A", "B"), number = c(0.1, 
0.15, 0.55, 0.75, 0.45)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 1

Related Questions