Reputation: 4087
I've imported a csv file with lots of columns and sections of data.
v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))
The layout of the file is something like this:
Dataset1
time, data, .....
0 0
0 <NA>
0 0
Dataset2
time, data, .....
00:00 0
0 <NA>
0 0
(The headers of the different datasets is exactly the same.
Now, I can plot the first dataset with:
plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")
I am curious if there is a better way to:
Get all the numbers read as numbers, without having to convert.
Address the different datasets in the file, in some meaningfull way.
Any hints would be appreciated. Thank you.
Status update:
I haven't really found a good solution yet in R, but I've started writing a script in Lua to seperate each individual time-series into a seperate file. I'm leaving this open for now, because I'm curious how well R will deal with all these files. I'll get 8 files per day.
Upvotes: 4
Views: 2558
Reputation: 44118
What I personally would do is to make a script in some scripting language to separate the different data sets before the file is read into R, and possibly do some of the necessary data conversions, too.
If you want to do the splitting in R, look up readLines
and scan
– read.csv2
is too high-level and is meant for reading a single data frame. You could write the different data sets into different files, or if you are ambitious, cook up file-like R objects that are usable with read.csv2
and read from the correct parts of the underlying big file.
Once you have dealt with separating the data sets into different files, use read.csv2
on those (or whichever read.table
variant is best – if those are not tabs but fixed-width fields, see read.fwf
). If <NA>
indicates "not available" in your file, be sure to specify it as part of na.strings
. If you don't do that, R thinks you have non-numeric data in that field, but with the right na.strings
, you automatically get the field converted into numbers. It seems that one of your fields can include time stamps like 00:00
, so you need to use colClasses
and specify a class to which your time stamp format can be converted. If the built-in Date
class doesn't work, just define your own timestamp
class and an as.timestamp
function that does the conversion.
Upvotes: 3