Jake L
Jake L

Reputation: 1067

filter() but keep groups without value

I am trying to condense a grouped df, pulling out only rows that contain a certain value, but that value isn't reflected in all groups. I want to find a way to pull out all rows with that value, but also make a NA or 0 row for groups not containing that value.

Ex:

x1 <- c('1','1','1','1','1','2','2','2','2','2','3','3','3','3','3')
x2 <- c('a','b','c','d','e','b','c','d','e','f','a','b','d','e','f')
df <- data.frame(x1,x2)

df %>% group_by(x1) %>%
  filter(x2 =="a")

this returns:

  x1    x2   
  <fct> <fct>
1 1     a    
2 3     a  

but I want it to return:

  x1    x2   
  <fct> <fct>
1 1     a    
2 2     NA
3 3     a  

Obviously the real code is much more complicated, so I'm looking for the best way to keep these empty groups in a reproducible way.

PS - I would like to stay in dplyr to keep smooth in a function chain

Thanks!

Upvotes: 4

Views: 977

Answers (5)

John P. S.
John P. S.

Reputation: 383

This happens because of the way Dplyr was written. According to Hadley Wickham (the Package Creator) to maintain NA values you should declare that you want them explicitly. As he said in this issue on github, you should filter(a == x | is.na(a)). In your case you use the following:

df %>% group_by(x1) %>%
filter(x2 =="a" | is.na(x2)

That you'll return you this as a result:

  x1    x2   
 <fct> <fct>
1 1     a    
2 2     NA
3 3     a  

In this code you're asking to R all rows in which x2 is equal to "a" and also those in which x2 is NA.

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102920

How about the base R solution using aggregate() like below?

dfout <- aggregate(x2~x1,df,function(v) ifelse("a" %in% v,"a",NA))

or

dfout <- aggregate(x2~x1,df,function(v) v[match("a", v)])

such that

> dfout
  x1   x2
1  1    a
2  2 <NA>
3  3    a

Upvotes: 0

dww
dww

Reputation: 31454

As you did not specify dplyr solutions only, here's one option with library(data.table)

setDT(df)
df[, .(x2 = x2[match('a', x2)]), x1]
#    x1   x2
# 1:  1    a
# 2:  2 <NA>
# 3:  3    a

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40181

One dplyr option could be:

df %>%
 group_by(x1) %>%
 slice(which.max(x2 == "a")) %>%
 mutate(x2 = replace(x2, x2 != "a", NA_complex_))

  x1    x2   
  <fct> <fct>
1 1     a    
2 2     <NA> 
3 3     a   

If it's relevant to have multiple target values per group:

df %>%
 group_by(x1) %>%
 filter(x2 == "a") %>%
 bind_rows(df %>%
            group_by(x1) %>%
            filter(all(x2 != "a")) %>%
            slice(1) %>%
            mutate(x2 = replace(x2, x2 != "a", NA_complex_)))

Upvotes: 1

akrun
akrun

Reputation: 887991

We can use complete after the filter step to get the missing combinations. By default, all the other columns will be filled with NA (it can be made to custom value with fill argument)

library(dplyr)
library(tidyr)
df %>% 
  filter(x2 == 'a') %>% 
  complete(x1 = unique(df$x1))
# A tibble: 3 x 2
#  x1    x2   
#  <fct> <fct>
#1 1     a    
#2 2     <NA> 
#3 3     a    

Another option is match

df %>% 
     group_by(x1) %>% 
     summarise(x2 = x2[match('a', x2)])

If there are many columns, then mutate 'x2' with match and then slice the first row

df %>%
   group_by(x1) %>%
   mutate(x2 = x2[match('a', x2)]) %>%
   slice(1)

Upvotes: 0

Related Questions