TiberiusGracchus2020
TiberiusGracchus2020

Reputation: 409

Calculate percent NA by ID variable in R

Title is self-explanatory. Looking to calculate percent NA by ID group in R. There are lots of posts on calculating NA by variable column but almost nothing on doing it by row groups.

Upvotes: 0

Views: 704

Answers (1)

akrun
akrun

Reputation: 886938

If there are multiple columns, after grouping by 'ID', use summarise_at to loop over the columns, create a logical vector with is.na, get the mean, and multiply by 100

library(dplyr)
df1 %>%
   group_by(ID) %>%
   summarise_at(vars(-group_cols()), ~ 100 *mean(is.na(.)))

If we want to get the percentage across all other variables,

library(tidyr)
df1 %>%
   pivot_longer(cols = -ID) %>%
   group_by(ID) %>%
   summarise(Perc = 100 * mean(is.na(value)))

Or with aggregate from base R

aggregate(.~ ID, df1, FUN = function(x) 100 * mean(is.na(x)), na.action = na.pass)

Or to get the percentage across, then unlist, the other columns, create a table with the logical vector and the 'ID' column, and use prop.table to get the percentage

prop.table(table(cbind(ID = df1$ID, 
        value = is.na(unlist(df1[setdiff(names(df1), "ID")]))))

Upvotes: 2

Related Questions