Reputation:
I have a data frame that looks something like this:
x y
1 a
1 b
1 c
1 NA
1 NA
2 d
2 e
2 NA
2 NA
And my desired output should be a data frame that should display the sum of all complete cases of Y (that is the non-NA values) with the corresponding X. So if supposing Y has 2500 complete observations for X = 1, and 557 observations for X = 2, I should get this simple data frame:
x y(c.cases)
1 2500
2 557
Currently my function performs well but only for a single X but when I mention X to be a range (for ex. 30:25) then I get the sum of all the Ys specified instead of individual complete observations for each X. This is an outline of my function:
complete <- function(){
files <- file.list()
dat<- c() #Creates an empty vector
Y <- c() #Empty vector that will list down the Ys
result <- c()
for(i in c(X)){
dat <- rbind(dat, read.csv(files[i]))
}
dat_subset_Y <- dat[which(dat[, 'X'] %in% x), ]
Y <- c(Y, sum(complete.cases(dat)))
result <- cbind(X, Y)
print(result)
}
There are no errors or warning messages but only wrong results in a range of Xs.
Upvotes: 0
Views: 183
Reputation: 5856
no need for that loop.
library(dplyr)
df %>%
filter(complete.cases(.))%>%
group_by(x) %>%
summarise(sumy=length(y))
Or
df %>%
group_by(x) %>%
summarise(sumy=sum(!is.na(y)))
Upvotes: 2
Reputation: 886968
We can use data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'x', get the sum
of all non NA elements (!is.na(y)
).
library(data.table)
setDT(df1)[, list(y=sum(!is.na(y))), by = x]
Or another option is table
with(df1, table(x, !is.na(y)))
Upvotes: 3