Reputation: 492
Despite reading the existing answers about this Error, I still don't know how to fix this problem in my particular case.
I have to get the sum of complete cases in a list of files. Each file (e.g. file1 corresponds to an id (e.g id1 for file1). My goal is to get a data frame with the number of complete cases for each id (therefore for each file, as file1 contains the pollutants of id1, and file2 contains the pollutants of id2 and so on)
When I run the function: complete("pollu", 1:10)
--> everything works perfectly
complete("pollu", 34)
I get ID 34 times, with 33 times returning NA and finally returning the number of complete cases.
complete(".", c(2, 4, 8, 10, 12))
I get the error:
Error in data.frame(id, nobs) : arguments imply differing number of rows: 5, 12
Any help on understanding the error and fixing it would be appreciated.
complete <- function(directory,id=1:332) {
nobs <- vector()
files <- list.files(directory)
for (i in id) {
ID <- id
file <- read.csv(files[i])
nobs[i] <- sum(complete.cases(file),na.rm = TRUE)
}
df <- data.frame(ID,nobs)
colnames(df) <- c("ID", "nobs")
return (df)
}
Upvotes: 2
Views: 20984
Reputation: 1297
The problem lies in the for loop and how you've assigned a value to nobs[i]
complete("pollu", 34)
The loop only runs once with i <- 34
. But you assign a result to nobs[i]
, which is actually nobs[34]
. This gives you a vector with the 34th value assigned, leaving the others NA
by default.
complete(".", c(2, 4, 8, 10, 12))
The loop iterates over your 5 values. The biggest one being 12. In the last iteration you assign a value to nobs[12]
so your nobs
vector has length 12, while i
has only length 5.
To fix
for (i in seq_along(id))) {
ID <- id[i]
file <- read.csv(files[ID])
nobs[i] <- sum(complete.cases(file),na.rm = TRUE)
}
i will takes the values 1, 2, 3.. upto the number of ids you require.
EDIT
As id
already contains the labels your require, you can use
df <- data.frame(id, nobs)
Upvotes: 3