Andrea
Andrea

Reputation: 111

group_by and keep all groups that does not not contain specific value and filter where there is value

I have the following dataframe:

df <- data.frame(
  Code = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b"),
  Inst = c("Yes", "No", "No", "No", "No", "No", "No", "No", "No", "No"),
  Date = c(
    "2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05", 
    "2021-01-06", "2021-01-06", "2021-01-06", "2021-01-09", "2021-01-10"
  )
)

I want to apply dplyr::group_by to the variable Code and filter for specific value "Yes" and for minimum Date, but I want to keep all observations of groups that do not contain the Yes value. I tried filter(any(Inst == "Yes")) but this does not work.

I would like to have this result:

Code  Inst  Date
a      Yes  2021-01-01
b      No   2021-01-06
b      No   2021-01-06
b      No   2021-01-06

Upvotes: 3

Views: 1233

Answers (3)

Waldi
Waldi

Reputation: 41240

With dplyr :

library(dplyr)

df %>%
  group_by(Code) %>%
  summarize(
    across(everything(), function(x) {
      if (any(Inst == "Yes")) x[which.max(Inst == "Yes")] else x
    })
  ) %>%
  ungroup()

#> `summarise()` has grouped output by 'Code'. You can override using the `.groups` argument.
#> # A tibble: 6 x 3
#>   Code  Inst  Date      
#>   <chr> <chr> <chr>     
#> 1 a     Yes   2021-01-01
#> 2 b     No    2021-01-06
#> 3 b     No    2021-01-06
#> 4 b     No    2021-01-06
#> 5 b     No    2021-01-09
#> 6 b     No    2021-01-10

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102251

A dplyr option

df %>%
  group_by(Code) %>%
  filter(ifelse(all(Inst == "No"), c, `!`)(Inst == "No")) %>%
  filter(Date == min(Date)) %>%
  ungroup()

gives

# A tibble: 4 x 3
  Code  Inst  Date      
  <chr> <chr> <chr>
1 a     Yes   2021-01-01
2 b     No    2021-01-06
3 b     No    2021-01-06
4 b     No    2021-01-06

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 40131

If there could be multiple Yes values:

df %>%
 group_by(Code) %>%
 slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes"))

  Code  Inst 
  <chr> <chr>
1 a     Yes  
2 b     No   
3 b     No   
4 b     No   
5 b     No   
6 b     No  

Considering the updated question:

df %>%
 mutate(Date = as.Date(Date, format = "%Y-%m-%d")) %>%
 group_by(Code) %>%
 slice(if(all(Inst != "Yes")) 1:n() else which(Inst == "Yes")) %>%
 filter(Date == min(Date))

  Code  Inst  Date      
  <chr> <chr> <date>    
1 a     Yes   2021-01-01
2 b     No    2021-01-06
3 b     No    2021-01-06
4 b     No    2021-01-06

Upvotes: 5

Related Questions