Monarch
Monarch

Reputation: 135

How do I return a list of IDs based on missing values of another variable?

It's been a while since I used R so apologies for asking probably such a basic question :s

I have a variable that has data in baseline, 4 months, and 12 months for the same IDs. I'm essentially trying to figure out which IDs have missing data in 4 months so I can delete those IDs from the entire dataset.

   ID     Baseline  4MOS     12MOS
123_ABC   53.5       NA       NA
456_DEF   45.1       32.5     12.2
789_GHI   45.4       NA       NA
923_JKL   88.4       11.1     23.1
734_BBB   45.4       20.1     NA
343_CHF   22.1       16.1     NA

I've gotten as far as identifying the row number where there is missing 4 month data:

clean <- which(is.na(df$4MONTHS))

This is a code I tried afterwards to try and return the IDs to me but it just gave me a message saying "Error: attempt to apply non-function":

clean <- list(df$ID(which(is.na(df$4MOS))))

Gladly appreciate any help re: this!

Upvotes: 2

Views: 1561

Answers (1)

NelsonGon
NelsonGon

Reputation: 13319

EDIT:

To get IDs with NAs(here we assume that all are NA not just any NA. In the latter case, use anyNA instead):

df %>% 
   group_by(ID) %>% 
   filter(all(is.na(X4MOS))) %>% 
   pull(ID)
[1] "123_ABC" "789_GHI"

base(no grouping):

df[is.na(df["X4MOS"]),"ID"]
[1] "123_ABC" "789_GHI"

ORIGINAL: Returns where all are not NA

A dplyr solution:

df %>% 
   group_by(ID) %>% 
   filter(!all(is.na(X4MOS)))
# A tibble: 4 x 4
# Groups:   ID [4]
  ID      Baseline X4MOS X12MOS
  <chr>      <dbl> <dbl>  <dbl>
1 456_DEF     45.1  32.5   12.2
2 923_JKL     88.4  11.1   23.1
3 734_BBB     45.4  20.1   NA  
4 343_CHF     22.1  16.1   NA 

With base(no grouping):

df[!is.na(df["X4MOS"]),]
       ID Baseline X4MOS X12MOS
2 456_DEF     45.1  32.5   12.2
4 923_JKL     88.4  11.1   23.1
5 734_BBB     45.4  20.1     NA
6 343_CHF     22.1  16.1     NA

Data:

df <- structure(list(ID = c("123_ABC", "456_DEF", "789_GHI", "923_JKL", 
"734_BBB", "343_CHF"), Baseline = c(53.5, 45.1, 45.4, 88.4, 45.4, 
22.1), X4MOS = c(NA, 32.5, NA, 11.1, 20.1, 16.1), X12MOS = c(NA, 
12.2, NA, 23.1, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 2

Related Questions