user5250820
user5250820

Reputation:

How to subset a large data frame through FOR loops and print the desired result?

I have a data frame that looks something like this:

x    y
1    a
1    b
1    c
1    NA
1    NA
2    d
2    e
2    NA
2    NA

And my desired output should be a data frame that should display the sum of all complete cases of Y (that is the non-NA values) with the corresponding X. So if supposing Y has 2500 complete observations for X = 1, and 557 observations for X = 2, I should get this simple data frame:

x    y(c.cases)
1    2500
2    557

Currently my function performs well but only for a single X but when I mention X to be a range (for ex. 30:25) then I get the sum of all the Ys specified instead of individual complete observations for each X. This is an outline of my function:

complete <- function(){
    files <- file.list()
    dat<- c() #Creates an empty vector
    Y <- c()  #Empty vector that will list down the Ys
    result <- c()
    for(i in c(X)){
            dat <- rbind(dat, read.csv(files[i]))
            }
            dat_subset_Y <- dat[which(dat[, 'X'] %in% x), ]
            Y <- c(Y, sum(complete.cases(dat)))
            result <- cbind(X, Y)
            print(result)
    }

There are no errors or warning messages but only wrong results in a range of Xs.

Upvotes: 0

Views: 183

Answers (2)

Paulo E. Cardoso
Paulo E. Cardoso

Reputation: 5856

no need for that loop.

library(dplyr)
df %>%
  filter(complete.cases(.))%>%
  group_by(x) %>%
  summarise(sumy=length(y))

Or

df %>% 
  group_by(x) %>% 
  summarise(sumy=sum(!is.na(y)))

Upvotes: 2

akrun
akrun

Reputation: 886968

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'x', get the sum of all non NA elements (!is.na(y)).

library(data.table)
setDT(df1)[, list(y=sum(!is.na(y))), by = x]

Or another option is table

with(df1, table(x, !is.na(y)))

Upvotes: 3

Related Questions