tovare
tovare

Reputation: 4087

csv file with multiple time-series

I've imported a csv file with lots of columns and sections of data.

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))

The layout of the file is something like this:

Dataset1
time, data, .....
0       0
0       <NA>
0       0

Dataset2
time, data, .....
00:00   0
0       <NA>
0       0

(The headers of the different datasets is exactly the same.

Now, I can plot the first dataset with:

plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")

I am curious if there is a better way to:

  1. Get all the numbers read as numbers, without having to convert.

  2. Address the different datasets in the file, in some meaningfull way.

Any hints would be appreciated. Thank you.


Status update:

I haven't really found a good solution yet in R, but I've started writing a script in Lua to seperate each individual time-series into a seperate file. I'm leaving this open for now, because I'm curious how well R will deal with all these files. I'll get 8 files per day.

Upvotes: 4

Views: 2558

Answers (1)

Jouni K. Sepp&#228;nen
Jouni K. Sepp&#228;nen

Reputation: 44118

What I personally would do is to make a script in some scripting language to separate the different data sets before the file is read into R, and possibly do some of the necessary data conversions, too.

If you want to do the splitting in R, look up readLines and scanread.csv2 is too high-level and is meant for reading a single data frame. You could write the different data sets into different files, or if you are ambitious, cook up file-like R objects that are usable with read.csv2 and read from the correct parts of the underlying big file.

Once you have dealt with separating the data sets into different files, use read.csv2 on those (or whichever read.table variant is best – if those are not tabs but fixed-width fields, see read.fwf). If <NA> indicates "not available" in your file, be sure to specify it as part of na.strings. If you don't do that, R thinks you have non-numeric data in that field, but with the right na.strings, you automatically get the field converted into numbers. It seems that one of your fields can include time stamps like 00:00, so you need to use colClasses and specify a class to which your time stamp format can be converted. If the built-in Date class doesn't work, just define your own timestamp class and an as.timestamp function that does the conversion.

Upvotes: 3

Related Questions