Reputation: 1067
I am trying to condense a grouped df, pulling out only rows that contain a certain value, but that value isn't reflected in all groups. I want to find a way to pull out all rows with that value, but also make a NA or 0 row for groups not containing that value.
Ex:
x1 <- c('1','1','1','1','1','2','2','2','2','2','3','3','3','3','3')
x2 <- c('a','b','c','d','e','b','c','d','e','f','a','b','d','e','f')
df <- data.frame(x1,x2)
df %>% group_by(x1) %>%
filter(x2 =="a")
this returns:
x1 x2
<fct> <fct>
1 1 a
2 3 a
but I want it to return:
x1 x2
<fct> <fct>
1 1 a
2 2 NA
3 3 a
Obviously the real code is much more complicated, so I'm looking for the best way to keep these empty groups in a reproducible way.
PS - I would like to stay in dplyr to keep smooth in a function chain
Thanks!
Upvotes: 4
Views: 977
Reputation: 383
This happens because of the way Dplyr was written.
According to Hadley Wickham (the Package Creator) to maintain NA values you should declare that you want them explicitly. As he said in this issue on github, you should filter(a == x | is.na(a))
. In your case you use the following:
df %>% group_by(x1) %>%
filter(x2 =="a" | is.na(x2)
That you'll return you this as a result:
x1 x2
<fct> <fct>
1 1 a
2 2 NA
3 3 a
In this code you're asking to R all rows in which x2 is equal to "a" and also those in which x2 is NA.
Upvotes: 1
Reputation: 102920
How about the base R solution using aggregate()
like below?
dfout <- aggregate(x2~x1,df,function(v) ifelse("a" %in% v,"a",NA))
or
dfout <- aggregate(x2~x1,df,function(v) v[match("a", v)])
such that
> dfout
x1 x2
1 1 a
2 2 <NA>
3 3 a
Upvotes: 0
Reputation: 31454
As you did not specify dplyr solutions only, here's one option with library(data.table)
setDT(df)
df[, .(x2 = x2[match('a', x2)]), x1]
# x1 x2
# 1: 1 a
# 2: 2 <NA>
# 3: 3 a
Upvotes: 1
Reputation: 40181
One dplyr
option could be:
df %>%
group_by(x1) %>%
slice(which.max(x2 == "a")) %>%
mutate(x2 = replace(x2, x2 != "a", NA_complex_))
x1 x2
<fct> <fct>
1 1 a
2 2 <NA>
3 3 a
If it's relevant to have multiple target values per group:
df %>%
group_by(x1) %>%
filter(x2 == "a") %>%
bind_rows(df %>%
group_by(x1) %>%
filter(all(x2 != "a")) %>%
slice(1) %>%
mutate(x2 = replace(x2, x2 != "a", NA_complex_)))
Upvotes: 1
Reputation: 887991
We can use complete
after the filter
step to get the missing combinations. By default, all the other columns will be filled with NA
(it can be made to custom value with fill
argument)
library(dplyr)
library(tidyr)
df %>%
filter(x2 == 'a') %>%
complete(x1 = unique(df$x1))
# A tibble: 3 x 2
# x1 x2
# <fct> <fct>
#1 1 a
#2 2 <NA>
#3 3 a
Another option is match
df %>%
group_by(x1) %>%
summarise(x2 = x2[match('a', x2)])
If there are many columns, then mutate
'x2' with match
and then slice
the first row
df %>%
group_by(x1) %>%
mutate(x2 = x2[match('a', x2)]) %>%
slice(1)
Upvotes: 0