Reputation: 1471
I need to take 3rd column for each one of 50 csv files that I have and get variances of them in R.
files <- list.files(path="path\\to\\csv", pattern="*.csv", full.names=T, recursive=FALSE)
lapply(files, function(x) {
t <- read.csv(x, header=F) # load file
# apply function
out <- var(t[3])
out
# write to file
#write.csv(out, "path\\to\\dir\\variances.csv", sep="\t", quote=F, row.names=F, col.names=T)
})
This is what I have so far, and I need some help on how I can use from 2nd row to the last row for each csv files to get variances and only 3rd columns.
Also, if I can write a dataframe with each file's name without ".csv" as column names and their variances as values in a csv file. Basically it will be a 1x50 data frame
Thank you for your help
Upvotes: 0
Views: 541
Reputation: 10855
Here is a complete, working example using Pokémon statistics from pokemondb.net. We'll download the data, extract to a folder of 8 csv files (one for each of the first 8 generations of Pokémon) and then read each file, subsetting to the 8th column and rows 2 - N.
We'll calculate variance on each of these columns, then use unlist()
to combine the stats in a single vector.
download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/PokemonData.zip",
"pokemonData.zip",
method="curl",mode="wb")
unzip("pokemonData.zip",exdir="./pokemonData")
thePokemonFiles <- list.files("./pokemonData",
full.names=TRUE)
varianceList <- lapply(thePokemonFiles,function(x) {
# read data and subset to 8th column, drop first row
data <- read.csv(x)[-1,8]
var(data,na.rm=TRUE)
})
# unlist to combine into a vector
unlist(varianceList)
...and the output:
> unlist(varianceList)
[1] 716.7932 812.0668 968.6125 915.8592 934.8132 1607.4362 1049.9671
[8] 1016.2672
NOTE: on Windows, use method="wininet"
argument in download.file()
.
Upvotes: 1