subset rows by a value while filtering columns in R

Question

I have several datasets ("001.csv","002.csv", and so on, until 332) stored in the same folder, with the following structure (example):

id  p1    p2    
2   35.0  na    
2   5.00  2.05  
2   0.35  1.56  
2   na    0.79 
2   5.23  0.13
2   5.01  0.03

I need to create a function that would read one or more files and gives me back the number of cases where both "p1" and "p2" have a given value (that is, no NA), for which I wrote this:

cc <- function(directory, id=1:332) {
    files_list <- list.files(directory, full.names = TRUE)
    for (i in id) {
            dat <- read.csv(files_list[i])
    }
    nobs <- length(which(!is.na(dat$p1) & !is.na(dat$p2)))
    completecases <- data.frame(id, nobs)
    completecases
    }

This works perfectly if I choose a single value for "id"; in that case, the outcome would be something like:

> cc(directory, 1)
    id nobs
    1  3

But, if I want to know the number of observations in more than one file, it gives me back, for each "id", the number of observations for the highest value of "id". For instance,

> cc(directory, 1:2)
    id nobs
    1  4
    2  4

instead of:

> cc(directory, 1:2)
    id nobs
    1  3
    2  4

I believe I need to subset my data by "id" or use "rbind" for each "id", but I have failed so far to get the right formula. Does anyone know how to fix this?

mto23 · Accepted Answer

The reason it was not working is that I should include the "nobs" in the for loop, like:

cc <- function(directory, id=1:332) {
files_list <- list.files(directory, full.names = TRUE)
nobs <- c()
for (i in id) {
        dat <- read.csv(files_list[i])
        nobs <- c(nobs, length(which(!is.na(dat$p1) & !is.na(dat$p2))))
}
completecases <- data.frame(id, nobs)
completecases
}

Without considering it, the "nobs" as always accounting for the last value of "id" in dat.

subset rows by a value while filtering columns in R

Answers (2)

Related Questions