Reputation: 67
I have a list of 701 given csv
files. Each one has the same number of columns (7) but different number of rows (between 25000 and 28000).
Here is an extract of the first file:
Date,Week,Week Day,Hour,Price,Volume,Sale/Purchase
18/03/2011,11,5,1,-3000.00,17416,Sell
18/03/2011,11,5,1,-1001.10,17427,Sell
18/03/2011,11,5,1,-1000.00,18055,Sell
18/03/2011,11,5,1,-500.10,18057,Sell
18/03/2011,11,5,1,-500.00,18064,Sell
18/03/2011,11,5,1,-400.10,18066,Sell
18/03/2011,11,5,1,-400.00,18066,Sell
18/03/2011,11,5,1,-300.10,18068,Sell
18/03/2011,11,5,1,-300.00,18118,Sell
Now I am trying to plot Volume
and Date
on condition that the Price
is exactly 200.00
. And then I am trying to get one window where I can see the progress of the Volume over the time.
allenamen <- dir(pattern="*.csv")
alledat <- lapply(allenamen, read.csv, header = TRUE,
sep = ",", stringsAsFactors = FALSE)
verlauf <- function(a) {plot(Volume ~ Date, a,
data=subset(a, (Price=="200.00")),
ylim = c(15000, 45000),
xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")}
lapply(alledat, verlauf)
But I get this error:
error in strsplit(log, NULL): non-character argument
How can I avoid the error?
Upvotes: 4
Views: 879
Reputation: 99331
Here are a couple of suggestions.
Use list.files
, not dir
, to find files. dir
is used to list the files in a directory. The way you are using it is for the current directory.
header = TRUE
and sep = ","
are default arguments for read.csv
, and therefore unnecessary in your code.
Subset each file as it's read
Here's a suggested method.
> fnames <- list.files(pattern = "*.csv")
> read <- lapply(fnames, function(x){
rd <- read.csv(x, stringsAsFactors = FALSE)
subset(rd, Price == 200)
})
> dat <- do.call(rbind, read)
And you should then be able to plot dat
.
Upvotes: 2
Reputation: 83215
When you want to combine all subsets for Price==200
into one plot, you can use the following function:
plotprice <- function(x) {
files <- list.files(pattern="*.csv")
df <- data.frame()
for(i in 1:length(files)){
xx <- read.csv(as.character(files[i]))
xx <- subset(xx, Price==x)
df <- rbind(df, xx)
}
df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
}
With plotprice(200)
you will everything in one plot for Price==200
.
When you want a plot for each csv
file, you can use:
ploteach <- function(x) {
files <- list.files(pattern="*.csv")
for(i in 1:length(files)){
df <- read.csv(as.character(files[i]))
df <- subset(df, Price==x)
df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
}
}
ploteach(200)
Upvotes: 2
Reputation: 611
Ok, first you need to transform the result of your lapply - read.csv from a list of 701 csv's to a single data frame.
Added function to read and subset, to avoid running out of RAM:
#
# function to read and subset data to avoid running out of RAM
read.subset <- function(dateiname){
a <- read.csv(file = dateiname, header = TRUE, sep = ",",
stringsAsFactors = FALSE)
a <- a[a$Price == 200.00,]
print(gc()) # monitor and clean RAM after each file is read
return(a)
}
* Update 2: Added a faster implementation of read.subset using scan
# function to read and subset data to avoid running out of RAM
read.subset.fast <- function(dateiname){
# get data from csv into a data.frame
a <- scan(file = dateiname,
what = c(list(character()),
rep(list(numeric()),5),
list(character())),
skip = 1, # skip header (equivalent to header = TRUE)
sep = ",")
# transform efficiently list into data.frame
attributes(a) <- list(class = "data.frame",
row.names = c(NA_integer_, length(a[[1]])),
names = scan(file = dateiname,
what = character(),
skip = 0,
nlines = 1, # just read first line to extract column names
sep = ","))
# subset data
a <- a[a$Price == 200.00,]
print(gc())
return(a)
}
#
Now let's read, subset and combine data in a single data frame:
#
allenamen <- list.files(pattern="*.csv") # updated (@Richard Scriven)
# get a single data frame, instead of a list of 701 data frames
alledat <- do.call(rbind, lapply(allenamen, read.subset.fast))
#
Transform date in to the right format:
# get dates in dates format
alledat$Date <- as.Date(as.character(alledat$Date), format="%d/%m/%Y")
Then you are good to go, no function needed. Just plot it:
plot(Volume ~ Date,
data = alledat,
ylim = range(Volume),
xlim = range(Date),
type = "l")
Upvotes: 0