Reputation: 21
While I thought I was on track to being an R guru in no time, my most recent problem sets were a rude awakening lol. I've searched this community and practiced a variety of tutorials before posting this question. Ultimately, I need to pass through a directory of CSV files and create a final data frame that shows the number of complete cases for each specific file. So if I wanted to search files[1:3] in the directory, a data frame would result showing the number of complete cases for each specific file 1 - X, 2 - Y, 3 - Z. When I run this code:
complete <- function(directory, id = 1:332) {
files_list <- list.files(directory, full.names=TRUE)
for(file in id){
data <- data.frame()
data <- rbind(data, read.csv(files_list[file], header=TRUE))
nobs <- sum(complete.cases(data))
}
allnobs <- data.frame(id, nobs)
allnobs
}
I receive a data.frame that lists the number of complete.cases for the final CSV file in ID on each row, whereas 192 should only pair with ID 8 and every other ID should have its own unique number of complete cases. My result with 192 listed for each ID:
> complete("specdata", 1:8)
id nobs
1 1 192
2 2 192
3 3 192
4 4 192
5 5 192
6 6 192
7 7 192
8 8 192
I also tried moving the original data.frame created outside of the for loop:
complete <- function(directory, id = 1:332) {
files_list <- list.files(directory, full.names=TRUE)
data <- data.frame()
for(file in id){
data <- rbind(data, read.csv(files_list[file], header=TRUE))
nobs <- sum(complete.cases(data))
}
allnobs <- data.frame(id, nobs)
allnobs
}
--which ends up giving me the total of complete.cases observed in all files:
> complete("specdata", 1:8)
id nobs
1 1 3139
2 2 3139
3 3 3139
4 4 3139
5 5 3139
6 6 3139
7 7 3139
8 8 3139
Any assistance here would be greatly appreciated.
Upvotes: 0
Views: 1519
Reputation: 21
Here you go:## dir is your directory
complete<-function(dir,id)
{
setwd("D:/R WD/assignment1")
file_list <- list.files(dir, full.names = FALSE)
setwd("D:/R WD/assignment1/specdata")
nobs<-integer(length(id))
p<- 1
for(i in id)
{
data <- read.csv(file_list[i], header=TRUE)
n<-sum(complete.cases(data))
nobs[p]<-n
p<-p+1
}
cbind(id,nobs)
}
The output:
> complete("specdata", 1:8)
id nobs
[1,] 1 117
[2,] 2 1041
[3,] 3 243
[4,] 4 474
[5,] 5 402
[6,] 6 228
[7,] 7 442
[8,] 8 192
--Regards DUDU
Upvotes: 0