Dima
Dima

Reputation: 146

Select and remove NAs within a group on condition

The following data frame:

   id participate grade year
1   1          NA     4 1982
2   1           1     4 1982
3   1           4     4 1982
4   4          NA    NA 1987
5   5          NA    NA 1986
6   5          NA     1 1986
7   5          NA     1 1986
8   7          NA     2 1984
9   7           4     2 1984
10  7           1     2 1984
11  9          NA     1 1987
12  9           1     1 1987
13 10          NA    NA 1984
14 10          NA     2 1984
15 10           4     2 1984
16 11          NA     4 1985
17 11           1     4 1985
18 13          NA     3 1985
19 13           1     3 1985

My goal is to identify and delete per group (id) the rows where "participate" is.na, BUT only if "participate" is filled in other rows within this group.

That means in this case: delete row 1 for id=1. For id=4 I don't delete because there is no more information within the group. The same is for id=5. Respectively, rows 8, 11, 13, 14 etc. should be deleted

Here is the desired output.

      id participate grade  year
1      1           1     4  1982
2      1           4     4  1982
3      4          NA    NA  1987
4      5          NA    NA  1986
5      5          NA     1  1986
6      5          NA     1  1986
7      7           4     2  1984
8      7           1     2  1984
9      9           1     1  1987
10    10           4     2  1984
11    11           1     4  1985
12    13           1     3  1985

Upvotes: 0

Views: 64

Answers (1)

www
www

Reputation: 39174

# Load package
library(tidyverse)

# Create example dataset
dat <- data_frame(id = c(1L, 1L, 1L, 4L, 5L,
                         5L, 5L, 7L, 7L, 7L,
                         9L, 9L, 10L, 10L, 10L,
                         11L, 11L, 13L, 13L),
                  participate = c(NA, 1L, 4L, NA, NA,
                                  NA, NA, NA, 4L, 1L,
                                  NA, 1L, NA, NA, 4L,
                                  NA, 1L, NA, 1L),
                  grade = c(4L, 4L, 4L, NA, NA,
                            1L, 1L, 2L, 2L, 2L,
                            1L, 1L, NA, 2L, 2L, 
                            4L, 4L, 3L, 3L),
                  year = c(1982, 1982, 1982, 1987, 1986,
                           1986, 1986, 1984, 1984, 1984,
                           1987, 1987, 1984, 1984, 1984,
                           1985, 1985, 1985, 1985))

# Filter the data
dat2 <- dat %>%
  group_by(id) %>%
  filter(!is.na(participate) | all(is.na(participate)))

# See the result
dat2

Source: local data frame [12 x 4]
Groups: id [8]

      id participate grade  year
   <int>       <int> <int> <dbl>
1      1           1     4  1982
2      1           4     4  1982
3      4          NA    NA  1987
4      5          NA    NA  1986
5      5          NA     1  1986
6      5          NA     1  1986
7      7           4     2  1984
8      7           1     2  1984
9      9           1     1  1987
10    10           4     2  1984
11    11           1     4  1985
12    13           1     3  1985

Upvotes: 1

Related Questions