Dom42
Dom42

Reputation: 156

Exclude groups with NAs in tidy dataset

I have a tidy tibble with a value column identified by 4 ID columns.

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

The dput to reproduce is

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.

As you can see, in tibble row 10 there is a NA.

I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.

Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).

Upvotes: 1

Views: 247

Answers (3)

akrun
akrun

Reputation: 887088

There is anyNA in base R

library(dplyr)
MWA %>%
    group_by(Dir) %>%
    filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups:   Dir [1]
#     VP   Con   Dir   Seg time_seg
#  <int> <int> <int> <int>    <int>
#1    10     2     2     1      320
#2    10     2     2     2     1110
#3    10     2     2     3      450
#4    10     2     2     4      600
#5    10     2     2     5     1680
#6    10     2     2     6      730
#7    10     2     2     7      850
#8    10     2     2     8      840

Upvotes: 2

paqmo
paqmo

Reputation: 3729

Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840

Upvotes: 1

kath
kath

Reputation: 7724

You can first check whether there is any missing value in the specific column and then exclude the whole group.

library(dplyr)

MWA %>% 
  group_by(VP, Con, Dir) %>% 
  mutate(any_na = any(is.na(time_seg))) %>% 
  filter(!any_na)

# A tibble: 8 x 6
# Groups:   VP, Con, Dir [1]
#     VP   Con   Dir   Seg time_seg any_na
#   <int> <int> <int> <int>    <int> <lgl> 
# 1    10     2     2     1      320 FALSE 
# 2    10     2     2     2     1110 FALSE 
# 3    10     2     2     3      450 FALSE 
# 4    10     2     2     4      600 FALSE 
# 5    10     2     2     5     1680 FALSE 
# 6    10     2     2     6      730 FALSE 
# 7    10     2     2     7      850 FALSE 
# 8    10     2     2     8      840 FALSE 

Upvotes: 2

Related Questions