user438383
user438383

Reputation: 6206

dplyr case_when across groups

I have df

df = data.frame(
    group = c(rep("A", 3), rep("B", 3)), 
    vt = c("SO:0001574", "SO:0001619", "SO:0001619", "SO:0001619", "SO:0001619", "SO:0001821")
    )

and two vectors:

tier_1 = c("SO:0001574", "SO:0001575")
tier_2 = c("SO:0001821", "SO:0001822")

I would like to produce an output:

  group         vt     ct
1     A SO:0001574 tier_1
2     A SO:0001619 tier_1
3     A SO:0001619 tier_1
4     B SO:0001619 tier_2
5     B SO:0001619 tier_2
6     B SO:0001821 tier_2

I.e. I would like to generate a third column, ct, which is filled based on the presence of the vt column in either tier_1 or tier_2, such that all rows within a given group are filled with that tier type.

I have tried:

df %>%
    dplyr::group_by(group) %>% 
    dplyr::mutate(tier = dplyr::case_when(
        vt %in% tier_1 ~ "tier_1",
        vt %in% tier_2 ~ "tier_2"))

But this only fills individual rows, rather than all rows within the group:

# A tibble: 6 x 4
# Groups:   group [2]
  group vt         ct     tier  
  <chr> <chr>      <chr>  <chr> 
1 A     SO:0001574 tier_1 tier_1
2 A     SO:0001619 tier_1 NA    
3 A     SO:0001619 tier_1 NA    
4 B     SO:0001619 tier_2 NA    
5 B     SO:0001619 tier_2 NA    
6 B     SO:0001821 tier_2 tier_2

Upvotes: 2

Views: 1509

Answers (2)

bretauv
bretauv

Reputation: 8567

You could also use fill() in {tidyr} after the first step you described:

library(tidyr)
library(dplyr)

df = data.frame(
  group = c(rep("A", 3), rep("B", 3)), 
  vt = c("SO:0001574", "SO:0001619", "SO:0001619", "SO:0001619", "SO:0001619", "SO:0001821")
)
tier_1 = c("SO:0001574", "SO:0001575")
tier_2 = c("SO:0001821", "SO:0001822")

df %>%
  group_by(group) %>% 
  mutate(tier = case_when(
    vt %in% tier_1 ~ "tier_1",
    vt %in% tier_2 ~ "tier_2")) %>%
  fill(tier, .direction = "updown") %>%
  ungroup()

# A tibble: 6 x 3
# Groups:   group [2]
#  group vt         tier  
#  <chr> <chr>      <chr> 
#1 A     SO:0001574 tier_1
#2 A     SO:0001619 tier_1
#3 A     SO:0001619 tier_1
#4 B     SO:0001619 tier_2
#5 B     SO:0001619 tier_2
#6 B     SO:0001821 tier_2

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

wrap the code in any to get one logical value per group :

library(dplyr)

df %>%
 group_by(group) %>% 
 mutate(tier = case_when(
                any(vt %in% tier_1) ~ "tier_1",
                any(vt %in% tier_2) ~ "tier_2"))

#  group vt         tier  
#  <chr> <chr>      <chr> 
#1 A     SO:0001574 tier_1
#2 A     SO:0001619 tier_1
#3 A     SO:0001619 tier_1
#4 B     SO:0001619 tier_2
#5 B     SO:0001619 tier_2
#6 B     SO:0001821 tier_2

Upvotes: 6

Related Questions