How to subset a large data frame through FOR loops and print the desired result?

Question

I have a data frame that looks something like this:

x    y
1    a
1    b
1    c
1    NA
1    NA
2    d
2    e
2    NA
2    NA

And my desired output should be a data frame that should display the sum of all complete cases of Y (that is the non-NA values) with the corresponding X. So if supposing Y has 2500 complete observations for X = 1, and 557 observations for X = 2, I should get this simple data frame:

x    y(c.cases)
1    2500
2    557

Currently my function performs well but only for a single X but when I mention X to be a range (for ex. 30:25) then I get the sum of all the Ys specified instead of individual complete observations for each X. This is an outline of my function:

complete <- function(){
    files <- file.list()
    dat<- c() #Creates an empty vector
    Y <- c()  #Empty vector that will list down the Ys
    result <- c()
    for(i in c(X)){
            dat <- rbind(dat, read.csv(files[i]))
            }
            dat_subset_Y <- dat[which(dat[, 'X'] %in% x), ]
            Y <- c(Y, sum(complete.cases(dat)))
            result <- cbind(X, Y)
            print(result)
    }

There are no errors or warning messages but only wrong results in a range of Xs.

Paulo E. Cardoso · Accepted Answer

no need for that loop.

library(dplyr)
df %>%
  filter(complete.cases(.))%>%
  group_by(x) %>%
  summarise(sumy=length(y))

Or

df %>% 
  group_by(x) %>% 
  summarise(sumy=sum(!is.na(y)))

How to subset a large data frame through FOR loops and print the desired result?

Answers (2)

Related Questions