Reputation: 156

Exclude groups with NAs in tidy dataset

I have a tidy tibble with a value column identified by 4 ID columns.

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

The dput to reproduce is

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

They stem from a larger data set, where they have been grouped by VP, Con and finally Dir.

As you can see, in tibble row 10 there is a NA.

I now want to exclude the whole Dir group (so rows 1 trough 8), based on this condition that this one value is missing using dplyr.

Using the filter with is.na or complete.cases only removes the row with the NA, not the complete group (which is one "case" in this dataset).

Upvotes: 1

Answers (3)

akrun

Reputation: 887951

There is anyNA in base R

library(dplyr)
MWA %>%
    group_by(Dir) %>%
    filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups:   Dir [1]
#     VP   Con   Dir   Seg time_seg
#  <int> <int> <int> <int>    <int>
#1    10     2     2     1      320
#2    10     2     2     2     1110
#3    10     2     2     3      450
#4    10     2     2     4      600
#5    10     2     2     5     1680
#6    10     2     2     6      730
#7    10     2     2     7      850
#8    10     2     2     8      840

Upvotes: 2

paqmo

Reputation: 3739

Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840

Upvotes: 1

kath

Reputation: 7734

You can first check whether there is any missing value in the specific column and then exclude the whole group.

library(dplyr)

MWA %>% 
  group_by(VP, Con, Dir) %>% 
  mutate(any_na = any(is.na(time_seg))) %>% 
  filter(!any_na)

# A tibble: 8 x 6
# Groups:   VP, Con, Dir [1]
#     VP   Con   Dir   Seg time_seg any_na
#   <int> <int> <int> <int>    <int> <lgl> 
# 1    10     2     2     1      320 FALSE 
# 2    10     2     2     2     1110 FALSE 
# 3    10     2     2     3      450 FALSE 
# 4    10     2     2     4      600 FALSE 
# 5    10     2     2     5     1680 FALSE 
# 6    10     2     2     6      730 FALSE 
# 7    10     2     2     7      850 FALSE 
# 8    10     2     2     8      840 FALSE

Upvotes: 2

Exclude groups with NAs in tidy dataset

Answers (3)

Related Questions