Reputation: 69
I am having trouble figuring out for loops in R after learning in Python for a while. What I want to do is pull out $nitrate or $Sulfate from the vector of CSV files this code returns:
getpollutant <- function(id=1:332, directory, pollutant) {
data<-c()
for (i in id) {
data[i]<- c(paste(directory, "/",formatC(i, width=3, flag=0),".csv",sep=""))
}
df<-c()
for (d in 1:length(data)){ df[[d]]<-c(read.csv(data[d]))
}
df
}
I haven't included the for loop for pollutant yet, I've tried many different approaches but can't get it to work quite right... with the code above I can put in: getpollutant(1:10, "specdata") and it will give me all the csv files from the specdata directory with labels 001 through 010, it spits out each csv file in separated chunks with headers of the format [[i]]$columnname with the contents of the column listed below. What I want to do is pull out a specific columnname (pollutant) and return the contents of that column from every csv file. I have read through the help pages and just can't seem to get my formatting right...
@RomanLuštrik I don't know if this is what you're looking for but here's a sample output if I put in
getpollutant(1, "specdata"):
[[1]]
[[1]]$Date
[1] 2003-01-01 2003-01-02 2003-01-03
[[1]]$sulfate
[1] NA NA NA NA NA NA 7.210 NA NA NA 1.300
[[1]]$nitrate
[1] NA NA NA .474 NA NA NA .964 NA NA NA
obviously this is a very small version of what the output is but basically it takes the CSV files in the specified range id and prints them out like this...
Upvotes: 0
Views: 211
Reputation: 49660
Do you only want to read in a certain column from the files? and do you know which column it is by number (e.g. the 3rd column)? In that case you can use the colClasses
argument to read.table
/read.csv
to specify only reading in the given column.
If you don't know which column it is ahead of time then you may need to read in the entire file, then only return the given column. In that case you probably want to use [[]]
subsetting instead of $
subsetting.
You can also make your code more compact and possibly more efficient by using sprintf
and lapply
or sapply
.
Consider this code:
lapply(1:332, function(id) {
read.csv( sprint("%s/%03d.csv", directory, id )
})
or
sapply( list.files(directory, pattern='\\.csv$',full.names=TRUE),
function(nm) read.csv(nm)[[pollutant]] )
Upvotes: 1