Reputation: 43
Here is an example of the original data:
ID Test1 Test2 Test3 Test4
1 0 0 NA 1.2
1 0 NA NA 3.0
1 NA NA NA 0
2 0 0 0 0
2 0 0 NA NA
I want to count how many non-NA occurrences (so including 0) for each ID and replace that columns value with that number. Producing this:
ID Test1 Test2 Test3 Test4
1 2 1 NA 3
2 2 2 1 1
I don't know if I need to use the sqldf package in R. I tried coercing the data frame to a data table and reshaping it but it did not work.
df <- x %>% melt(idvars='ID')
Thanks for the help.
Upvotes: 1
Views: 830
Reputation: 270045
Below we discuss solutions using the two packages mentioned in the question.
1) sqldf To use the sqldf package referenced in the question using the input defined reproducibly in the Note at the end:
library(sqldf)
sqldf("select ID,
nullif(count(Test1), 0) Test1,
nullif(count(Test2), 0) Test2,
nullif(count(Test3), 0) Test3,
nullif(count(Test4), 0) Test4
from DF
group by ID")
giving:
ID Test1 Test2 Test3 Test4
1 1 2 1 NA 3
2 2 2 2 1 1
nullif(count(test1), 0)
can be shortened to just count(test1)
if it is ok to report a 0 for an ID that is all NA and similarly for the other test* columns.
1a) If in reality there are many columns, not just 4, or you don't like repeating part of the select
we can build up the string and then insert it like this:
testNames <- names(DF)[-1]
select <- toString(sprintf("nullif(count(%s), 0) %s", testNames, testNames))
library(sqldf)
fn$sqldf("select ID, $select
from DF
group by ID")
Add the verbose = TRUE
argument to the sqldf
call to see that the same string is actually sent to the back end.
If it is ok to report 0 instead of NA then we could simplify the select <- ...
to:
select <- toString(sprintf("count(%s) %s", testNames, testNames))
2) reshape2 To use melt
as in the code attempt in the question:
library(magrittr)
library(reshape2)
count <- function(x) if (all(is.na(x))) NA_integer_ else sum(!is.na(x))
DF %>%
melt(id.vars = "ID") %>%
dcast(ID ~ variable, count)
If it is OK to report 0 for any ID that is all NA then count could be simplified to:
count <- function(x) sum(!is.na(x))
Lines <- "ID Test1 Test2 Test3 Test4
1 0 0 NA 1.2
1 0 NA NA 3.0
1 NA NA NA 0
2 0 0 0 0
2 0 0 NA NA"
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 2
Reputation: 887711
We can do a group by sum
on the logical vector
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise_all(funs(na_if(sum(!is.na(.)), 0)))
# A tibble: 2 x 5
# ID Test1 Test2 Test3 Test4
# <int> <int> <int> <int> <int>
#1 1 2 1 NA 3
#2 2 2 2 1 1
Or use aggregate
from base R
aggregate(.~ ID, df1, FUN = function(x) sum(!is.na(x)), na.action = NULL)
Or with rowsum
rowsum(+(!is.na(df1[-1])), df1$ID)
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Test1 = c(0L, 0L,
NA, 0L, 0L), Test2 = c(0L, NA, NA, 0L, 0L), Test3 = c(NA, NA,
NA, 0L, NA), Test4 = c(1.2, 3, 0, 0, NA)), class = "data.frame",
row.names = c(NA, -5L))
Upvotes: 3