Reputation: 135
It's been a while since I used R so apologies for asking probably such a basic question :s
I have a variable that has data in baseline, 4 months, and 12 months for the same IDs. I'm essentially trying to figure out which IDs have missing data in 4 months so I can delete those IDs from the entire dataset.
ID Baseline 4MOS 12MOS
123_ABC 53.5 NA NA
456_DEF 45.1 32.5 12.2
789_GHI 45.4 NA NA
923_JKL 88.4 11.1 23.1
734_BBB 45.4 20.1 NA
343_CHF 22.1 16.1 NA
I've gotten as far as identifying the row number where there is missing 4 month data:
clean <- which(is.na(df$4MONTHS))
This is a code I tried afterwards to try and return the IDs to me but it just gave me a message saying "Error: attempt to apply non-function":
clean <- list(df$ID(which(is.na(df$4MOS))))
Gladly appreciate any help re: this!
Upvotes: 2
Views: 1561
Reputation: 13319
EDIT:
To get IDs with NA
s(here we assume that all are NA
not just any NA. In the latter case, use anyNA
instead):
df %>%
group_by(ID) %>%
filter(all(is.na(X4MOS))) %>%
pull(ID)
[1] "123_ABC" "789_GHI"
base
(no grouping):
df[is.na(df["X4MOS"]),"ID"]
[1] "123_ABC" "789_GHI"
ORIGINAL: Returns where all are not NA
A dplyr
solution:
df %>%
group_by(ID) %>%
filter(!all(is.na(X4MOS)))
# A tibble: 4 x 4
# Groups: ID [4]
ID Baseline X4MOS X12MOS
<chr> <dbl> <dbl> <dbl>
1 456_DEF 45.1 32.5 12.2
2 923_JKL 88.4 11.1 23.1
3 734_BBB 45.4 20.1 NA
4 343_CHF 22.1 16.1 NA
With base
(no grouping):
df[!is.na(df["X4MOS"]),]
ID Baseline X4MOS X12MOS
2 456_DEF 45.1 32.5 12.2
4 923_JKL 88.4 11.1 23.1
5 734_BBB 45.4 20.1 NA
6 343_CHF 22.1 16.1 NA
Data:
df <- structure(list(ID = c("123_ABC", "456_DEF", "789_GHI", "923_JKL",
"734_BBB", "343_CHF"), Baseline = c(53.5, 45.1, 45.4, 88.4, 45.4,
22.1), X4MOS = c(NA, 32.5, NA, 11.1, 20.1, 16.1), X12MOS = c(NA,
12.2, NA, 23.1, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 2