Reputation: 7928
My data looks something like this:
df1 <- data.frame(
Z = sample(LETTERS[1:5], size = 10000, replace = T),
X1 = sample(c(1:10,NA), 10000, replace = T),
X2 = sample(c(1:25,NA), 10000, replace = T),
X3 = sample(c(1:5,NA), 10000, replace = T)
)
I can count the missing variables with:
data.frame("Total Missing" = colSums(is.na(df1)))
But, I would like to this by Z
. That is, the number of missing X1-3 for each value of Z.
I tried this
df1 %>% group_by(Z) %>% summarise('Total Missing' = colSums(is.na(df1)))
but it does not work as I expected.
Upvotes: 3
Views: 614
Reputation: 70256
You can use summarise_each
:
df1 %>%
group_by(Z) %>%
summarise_each(funs(sum(is.na(.))))
#Source: local data frame [5 x 4]
#
# Z X1 X2 X3
# (fctr) (int) (int) (int)
#1 A 169 77 334
#2 B 170 77 316
#3 C 159 78 348
#4 D 181 79 326
#5 E 174 69 341
Note that you can specify inside summarise_each
which columns to apply the function to (default is all columns except grouping columns) or which columns the function should not be applied to. It may also be interesting for you to note that like summarise_each
to summarise
, there's also mutate_each
as the complement to mutate
if you want to apply functions to all columns without summarising the result.
The obligatory data.table equivalent is:
library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(is.na(x))), by = Z]
# Z X1 X2 X3
#1: D 181 79 326
#2: C 159 78 348
#3: B 170 77 316
#4: A 169 77 334
#5: E 174 69 341
And in base R you could use a split/apply/combine approach like the following:
do.call(rbind,
lapply(
split(df1, df1$Z), function(dd) {
colSums(is.na(dd[-1]))
}))
# X1 X2 X3
#A 169 77 334
#B 170 77 316
#C 159 78 348
#D 181 79 326
#E 174 69 341
Or, also in base R, you can use aggregate
:
aggregate(df1[-1], list(df1$Z), FUN = function(y) sum(is.na(y)))
aggregate(. ~ Z, df1, FUN = function(y) sum(is.na(y)), na.action = "na.pass") # formula interface
Upvotes: 5