Reputation: 159
I am R beginner, the following is my code:
complete <- function(directory, id = 1:332) {
# Read through all the csv data file
for (i in id) {
i <- sprintf("%03d", as.numeric(i))
data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
good <- complete.cases(data) # Eliminating the NA rows
cases <- sum(good == TRUE) # add complete value
}
data.frame(id = id, nobs = cases )
}
when I print the output
id nobs
1 1 402
2 2 402
3 3 402
4 4 402
5 5 402 (incorrect)
if I just print the cases
[1] 117
[1] 1041
[1] 243
[1] 474
[1] 402
so the correct output should be
id nobs
1 1 117
2 2 1041
3 3 243
4 4 474
5 5 402
I realize it only take last value from the (cases).
My question is how can I store the (cases) output into a vector so when I call the data.frame function it will return the correct output.
thanks
Upvotes: 0
Views: 5068
Reputation: 1
complete <- function(directory ,id = 1:332){
folder = directory
df_total = data.frame()
for (x in id){
filenames <- sprintf("%03d.csv", x)
filenames <- paste(folder,filenames,sep="\\")
df <- do.call(rbind,lapply(filenames,read.csv, header=TRUE))
my_vector <- sum(complete.cases(enter the column for which you want))
df1 <- data.frame(id=x,nobs=my_vector)
df_total <- rbind(df_total,df1)
}
df_total
}
Upvotes: 0
Reputation: 81733
This is a more efficient function for the task:
complete <- function(directory, id = 1:332) {
filenames <- file.path(directory, paste0(sprintf("%03d", id), ".csv"))
data.frame(id = id,
nobs = sapply(filenames, function(x)
sum(complete.cases(read.csv(x)))))
}
Upvotes: 1
Reputation: 13310
This should do the job, if id is a numeric vector (untested since you provided no reprodicible example!)
Otherwise you should use for(i in seq_along(id))
and id[i]
inside the loop.
complete <- function(directory, id = 1:332) {
cases <- NULL
# Read through all the csv data file
for (i in id) {
i <- sprintf("%03d", as.numeric(i))
data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
good <- complete.cases(data) # Eliminating the NA rows
cases[i] <- sum(good == TRUE) # add complete value
}
data.frame(id = id, nobs = cases )
}
Upvotes: 1