Reputation: 467
I have several datasets ("001.csv","002.csv", and so on, until 332) stored in the same folder, with the following structure (example):
id p1 p2
2 35.0 na
2 5.00 2.05
2 0.35 1.56
2 na 0.79
2 5.23 0.13
2 5.01 0.03
I need to create a function that would read one or more files and gives me back the number of cases where both "p1" and "p2" have a given value (that is, no NA), for which I wrote this:
cc <- function(directory, id=1:332) {
files_list <- list.files(directory, full.names = TRUE)
for (i in id) {
dat <- read.csv(files_list[i])
}
nobs <- length(which(!is.na(dat$p1) & !is.na(dat$p2)))
completecases <- data.frame(id, nobs)
completecases
}
This works perfectly if I choose a single value for "id"; in that case, the outcome would be something like:
> cc(directory, 1)
id nobs
1 3
But, if I want to know the number of observations in more than one file, it gives me back, for each "id", the number of observations for the highest value of "id". For instance,
> cc(directory, 1:2)
id nobs
1 4
2 4
instead of:
> cc(directory, 1:2)
id nobs
1 3
2 4
I believe I need to subset my data by "id" or use "rbind" for each "id", but I have failed so far to get the right formula. Does anyone know how to fix this?
Upvotes: 0
Views: 89
Reputation: 467
The reason it was not working is that I should include the "nobs" in the for loop, like:
cc <- function(directory, id=1:332) {
files_list <- list.files(directory, full.names = TRUE)
nobs <- c()
for (i in id) {
dat <- read.csv(files_list[i])
nobs <- c(nobs, length(which(!is.na(dat$p1) & !is.na(dat$p2))))
}
completecases <- data.frame(id, nobs)
completecases
}
Without considering it, the "nobs" as always accounting for the last value of "id" in dat.
Upvotes: 0
Reputation: 13581
Try something like this
I edit your function to handle a single file and return the number of rows after filtering out rows with NA
count_nobs <- function(fi) {
require(dplyr)
dat <- read.csv(fi)
dat[complete.cases(dat), ] %>% count()
}
Call the function with purrr:map_dfr
which iterates through files_list
and rbinds the result
library(tidyverse)
files_list <- list.files(directory, full.names=TRUE)
result <- map_dfr(files_list, ~count_nobs(.x), .id="id")
Upvotes: 1