Reputation: 3441
I am trying to summarize data with NA values and am using the ddply function.
For example, using the data included below,
set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC"), 100),
ValOne = sample(c(1, 0, NA), replace = T, 300),
ValTwo = sample(c(1,NA), replace = T, 300),
VarThree = sample(c("Thanks", "alot"), replace = T, 300))
> head(dat)
IndID ValOne ValTwo
1 AAA 1 NA
2 BBB NA 1
3 CCC 0 NA
4 AAA NA NA
5 BBB NA NA
6 CCC 1 1
I want to calculate the number of times that each individual has a 1 in the ValOne and ValTwo column I am using the code below to create a new data.frame
and summarize the data by IndID and use both length
and sum
functions.
library(plyr)
tbl <- ddply(dat, "IndID", summarise,
ColOne = length(dat$ValOne[dat$ColOne == 1]),
NumHighHDOP = sum(dat$ValTwo[dat$ValTwo == 1], na.rm = T))
As seen below,
IndID ColOne NumHighHDOP
1 AAA 0 155
2 BBB 0 155
3 CCC 0 155
the resulting table summarizes the data for the entire data.frame
and not for each individual.
Both approaches (length
and sum
) are struggling with the NAs in the data.frame.
Any suggestions would be appreciated.
EDIT With the new data set including a factor. Is it also possible to calculate the number of "Thanks" for each individual?
Upvotes: 2
Views: 4738
Reputation: 887068
We can use dplyr
. We group by 'IndID', and get the count of 1 for each column with summarise_each
. To remove the NA
elements, either na.omit
or use a logical condition to output TRUE
only for 1.
library(dplyr)
dat %>%
group_by(IndID) %>%
summarise_each(funs(sum(.==1 & !is.na(.))))
#or
#summarise_each(funs(sum(na.omit(.)==1)))
Based on the updated dataset in the OP's post, if we want to count the 'Thanks' in the third column, we can use %in%
(assuming that 'Thanks' is not present in the other two columns and 1 not in the last column).
dat %>%
group_by(IndID) %>%
summarise_each(funs(sum(na.omit(.) %in% c(1, 'Thanks'))))
Upvotes: 3