Reputation: 6447
I have a csv with some time series data with date-time formatted like this:
mydata <- read.csv("mydata.csv")
> mode(mydata$t_5min[1])
[1] "numeric"
It looks like R is interpreting it as a factor, since it can't understand the format:
mydata$t_5min[1]
[1] 1/3/2012 16:00
27698 Levels: 10/10/2012 10:00 10/10/2012 10:05 10/10/2012 10:10 10/10/2012 10:15 ... 9/6/2012 9:55
I tried using strptime, which seems to work okay on single entries:
> strptime(x=mydata$t_5min[2],format="%d/%m/%Y %H:%M", tz="")
[1] "2012-04-01 06:10:00"
> mode(strptime(x=mydata$t_5min[2],format="%d/%m/%Y %H:%M", tz=""))
[1] "list"
But if I try this with sapply, I get the following error:
mydata$t_5min <- sapply(mydata$t_5min, strptime, format="%d/%m/%Y %H:%M", tz="")
Error in `$<-.data.frame`(`*tmp*`, "t_5min", value = list(sec = 0, min = 0L, :
replacement has 9000 rows, data has 1000
I tried the timeDate library with slightly better results:
> as.timeDate(mydata$t_5min[1])
GMT
[1] [2012-01-03]
However, I need minute precision. However, the example code in the timeDate function doesn't seem to work (or I'm using it wrong, but it was kind of brief):
as.timeDate(mydata$t_5min[2], units=c("min"))
Error in as.timeDate(mydata$t_5min[2], units = c("min")) :
unused argument(s) (units = c("min"))
What's the right way to do convert this time data into something R can work with?
Here is some data to duplicate these results:
t_5min,n,value
1/3/2012 16:00,16,48.125
1/3/2012 16:05,28,44.39285714
1/3/2012 16:10,29,37.44827586
1/3/2012 16:15,28,30.39285714
1/3/2012 16:20,28,23.67857143
1/3/2012 16:25,29,19.10344828
1/3/2012 16:30,28,16.35714286
1/3/2012 16:35,29,14.34482759
1/3/2012 16:40,28,11.71428571
Upvotes: 2
Views: 1151
Reputation: 8753
try this:
> as.POSIXlt(as.character(df$t_5min), format="%d/%m/%Y %H:%M")
[1] "2012-03-01 16:00:00" "2012-03-01 16:05:00" "2012-03-01 16:10:00"
[4] "2012-03-01 16:15:00" "2012-03-01 16:20:00" "2012-03-01 16:25:00"
[7] "2012-03-01 16:30:00" "2012-03-01 16:35:00" "2012-03-01 16:40:00"
Upvotes: 2
Reputation: 121568
You can use read.zoo
to read directly your data in the right format:
library(zoo)
## you repalce text=... here by file = "mydata.csv"
read.zoo(text='
t_5min,n,value
1/3/2012 16:00,16,48.125
1/3/2012 16:05,28,44.39285714
1/3/2012 16:10,29,37.44827586
1/3/2012 16:15,28,30.39285714
1/3/2012 16:20,28,23.67857143
1/3/2012 16:25,29,19.10344828
1/3/2012 16:30,28,16.35714286
1/3/2012 16:35,29,14.34482759
1/3/2012 16:40,28,11.71428571',header=TRUE,format="%d/%m/%Y %H:%M", tz="",sep=',')
n value
2012-03-01 16:00:00 16 48.12500
2012-03-01 16:05:00 28 44.39286
2012-03-01 16:10:00 29 37.44828
2012-03-01 16:15:00 28 30.39286
2012-03-01 16:20:00 28 23.67857
2012-03-01 16:25:00 29 19.10345
2012-03-01 16:30:00 28 16.35714
2012-03-01 16:35:00 29 14.34483
2012-03-01 16:40:00 28 11.71429
Upvotes: 3