thbtmntgn
thbtmntgn

Reputation: 171

R - Select specific rows within group of rows

I have a dataframe like the following one:

     ID          STATUS
1638483        Very bad
1407499       Very good
1383920            Good
1407499             Bad

First column contains ID, some are unique but some others are not.
Second column contains STATUS which can be: "Very good", "Good", "Bad", or "Very Bad".

I'd like to:

The desired output would be:

     ID          STATUS
1638483        Very bad
1407499       Very good
1383920            Good

I tried to use the dplyr package. I succeed to group data by ID but then I'm stuck.

Upvotes: 2

Views: 3764

Answers (2)

Stijn
Stijn

Reputation: 96

One possible solution using dplyr:

# create tibble
df <- tibble(
  id = c("1638483", "1407499", "1383920", "1407499"),
  status = c("Very bad", "Very good", "Good", "Bad")
)

# solution
df %>%
  mutate_at("status", factor, 
            levels = c("Very bad", "Bad", "Good", "Very good")) %>%
  arrange(desc(status)) %>%
  group_by(id) %>%
  filter(status == status[1]) %>%
  ungroup()

Result:

# A tibble: 3 x 2
       id    status
    <chr>    <fctr>
1 1383920      Good
2 1407499 Very good
3 1638483  Very bad

Upvotes: 3

d.b
d.b

Reputation: 32548

Convert STATUS to factor according to desired levels and use ave

df$STATUS = factor(df$STATUS, levels = c("Very bad", "Bad", "Good", "Very good"))
df[ave(as.numeric(df$STATUS), df$ID, FUN = function(x) x == max(x)) == 1,]
#       ID    STATUS
#1 1638483  Very bad
#2 1407499 Very good
#3 1383920      Good

DATA

df = structure(list(ID = c(1638483L, 1407499L, 1383920L, 1407499L), 
    STATUS = c("Very bad", "Very good", "Good", "Bad")), .Names = c("ID", 
"STATUS"), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Related Questions