Reputation: 409
Title is self-explanatory. Looking to calculate percent NA by ID group in R. There are lots of posts on calculating NA by variable column but almost nothing on doing it by row groups.
Upvotes: 0
Views: 704
Reputation: 886938
If there are multiple columns, after grouping by 'ID', use summarise_at
to loop over the columns, create a logical vector with is.na
, get the mean
, and multiply by 100
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise_at(vars(-group_cols()), ~ 100 *mean(is.na(.)))
If we want to get the percentage across all other variables,
library(tidyr)
df1 %>%
pivot_longer(cols = -ID) %>%
group_by(ID) %>%
summarise(Perc = 100 * mean(is.na(value)))
Or with aggregate
from base R
aggregate(.~ ID, df1, FUN = function(x) 100 * mean(is.na(x)), na.action = na.pass)
Or to get the percentage across, then unlist
, the other columns, create a table
with the logical vector and the 'ID' column, and use prop.table
to get the percentage
prop.table(table(cbind(ID = df1$ID,
value = is.na(unlist(df1[setdiff(names(df1), "ID")]))))
Upvotes: 2