fYpsE
fYpsE

Reputation: 67

Plot many csv files in one window

I have a list of 701 given csv files. Each one has the same number of columns (7) but different number of rows (between 25000 and 28000).

Here is an extract of the first file:

Date,Week,Week Day,Hour,Price,Volume,Sale/Purchase
18/03/2011,11,5,1,-3000.00,17416,Sell
18/03/2011,11,5,1,-1001.10,17427,Sell
18/03/2011,11,5,1,-1000.00,18055,Sell
18/03/2011,11,5,1,-500.10,18057,Sell
18/03/2011,11,5,1,-500.00,18064,Sell
18/03/2011,11,5,1,-400.10,18066,Sell
18/03/2011,11,5,1,-400.00,18066,Sell
18/03/2011,11,5,1,-300.10,18068,Sell
18/03/2011,11,5,1,-300.00,18118,Sell

Now I am trying to plot Volume and Date on condition that the Price is exactly 200.00. And then I am trying to get one window where I can see the progress of the Volume over the time.

allenamen <- dir(pattern="*.csv")
alledat <- lapply(allenamen, read.csv, header = TRUE, 
   sep = ",", stringsAsFactors = FALSE)
verlauf <- function(a) {plot(Volume ~ Date, a, 
  data=subset(a, (Price=="200.00")), 
  ylim = c(15000, 45000), 
  xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")}
lapply(alledat, verlauf)

But I get this error:

error in strsplit(log, NULL): non-character argument

How can I avoid the error?

Upvotes: 4

Views: 879

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99331

Here are a couple of suggestions.

  1. Use list.files, not dir, to find files. dir is used to list the files in a directory. The way you are using it is for the current directory.

  2. header = TRUE and sep = "," are default arguments for read.csv, and therefore unnecessary in your code.

  3. Subset each file as it's read

Here's a suggested method.

> fnames <- list.files(pattern  = "*.csv")
> read <- lapply(fnames, function(x){
    rd <- read.csv(x, stringsAsFactors = FALSE)
    subset(rd, Price == 200)
    })
> dat <- do.call(rbind, read)

And you should then be able to plot dat.

Upvotes: 2

Jaap
Jaap

Reputation: 83215

When you want to combine all subsets for Price==200 into one plot, you can use the following function:

plotprice <- function(x) {
  files <- list.files(pattern="*.csv")
  df <- data.frame()
  for(i in 1:length(files)){
    xx <- read.csv(as.character(files[i]))
    xx <- subset(xx, Price==x)
    df <- rbind(df, xx)
  }
  df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
  plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
}

With plotprice(200) you will everything in one plot for Price==200.


When you want a plot for each csv file, you can use:

ploteach <- function(x) {
  files <- list.files(pattern="*.csv")
  for(i in 1:length(files)){
    df <- read.csv(as.character(files[i]))
    df <- subset(df, Price==x)
    df$Date <- as.Date(as.character(df$Date), format="%d/%m/%Y")
    plot(Volume ~ Date, df, ylim = c(15000, 45000), xlim = as.Date(c("2011-12-30", "2013-01-20")), type = "l")
  }
}

ploteach(200)

Upvotes: 2

luis_js
luis_js

Reputation: 611

Ok, first you need to transform the result of your lapply - read.csv from a list of 701 csv's to a single data frame.

Added function to read and subset, to avoid running out of RAM:

#
# function to read and subset data to avoid running out of RAM
read.subset <- function(dateiname){
   a <- read.csv(file = dateiname, header = TRUE, sep = ",",
                 stringsAsFactors = FALSE)
   a <- a[a$Price == 200.00,]
   print(gc())    # monitor and clean RAM after each file is read
   return(a)
}

* Update 2: Added a faster implementation of read.subset using scan

# function to read and subset data to avoid running out of RAM
read.subset.fast <- function(dateiname){
   # get data from csv into a data.frame
   a <- scan(file          = dateiname,
             what          = c(list(character()),
                               rep(list(numeric()),5),
                               list(character())),
             skip          = 1,  # skip header (equivalent to header = TRUE)
             sep           = ",")
   # transform efficiently list into data.frame
   attributes(a) <- list(class      = "data.frame",
                         row.names  = c(NA_integer_, length(a[[1]])),
                         names      = scan(file          = dateiname,
                                           what          = character(),
                                           skip          = 0,  
                                           nlines        = 1,  # just read first line to extract column names
                                           sep           = ","))
   # subset data
   a <- a[a$Price == 200.00,]
   print(gc())
   return(a)
}
#

Now let's read, subset and combine data in a single data frame:

#
allenamen <- list.files(pattern="*.csv") # updated (@Richard Scriven)
# get a single data frame, instead of a list of 701 data frames
alledat <- do.call(rbind, lapply(allenamen, read.subset.fast))
#

Transform date in to the right format:

# get dates in dates format
alledat$Date <- as.Date(as.character(alledat$Date), format="%d/%m/%Y")

Then you are good to go, no function needed. Just plot it:

plot(Volume ~ Date, 
     data = alledat,
     ylim = range(Volume),
     xlim = range(Date),
     type = "l")

Upvotes: 0

Related Questions