Reputation: 3941
1st off, sorry for not having any reproducible data here, but I can't figure out how to reproduce this problem. But I will do my best to include a step wise list of what I have done as well as any pertinent information. Any thoughts on troubleshooting would be greatly appreciated.
My problem is this:
I have a large time series data set that I read in to R. I eventually transform to zoo, but for now I keep it as a data frame. Using read.csv
I read the data into R. Using str
to look at the data I get this:
> str(Met)
'data.frame': 568354 obs. of 18 variables:
$ time_local : Factor w/ 568354 levels "2006-08-06 03:15:00",..: 1 2 3 4 5 6 7 8 9 10 ...
Note- Met$time_local is what I am concerned with and I've removed all other columns of the str readout.
If I search for duplicates using
Dup<-Met$time_local[duplicated(Met$time_local)]
I get nothing
str(Dup)
Factor w/ 568354 levels "2006-08-06 03:15:00",..:
If I transform the Date/Time data to a POSIXlt or POSIXct object using strptime
MetStrp<-strptime(Met$time_local, "%Y-%m-%d %H:%M:%S")
str(MetStrp)
POSIXlt[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...
and then search fro duplicates
Dup<-MetStrp[duplicated(MetStrp)]
> head(Dup)
[1] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[4] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
> str(Dup)
POSIXlt[1:60], format: "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00" ...
I now have 60 duplicates (which throw off things later when I create a zoo object).
And interestingly, if I change the POSIXlt format to POSIXct
ct<-as.POSIXct(MetStrp)
str(ct)
POSIXct[1:568354], format: "2006-08-06 03:15:00" "2006-08-06 03:20:00" "2006-08-06 03:25:00" ...
I get the same duplicates but offset by an hour
Dup<-ct[duplicated(ct)]
> head(Dup)
[1] "2007-03-11 01:00:00 PST" "2007-03-11 01:05:00 PST" "2007-03-11 01:10:00 PST"
[4] "2007-03-11 01:15:00 PST" "2007-03-11 01:20:00 PST" "2007-03-11 01:25:00 PST"
> str(Dup)
POSIXct[1:60], format: "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00" ...
If I choose to look for duplicate locations using
Dup_loc<-which(duplicated(MetStrp) | duplicated(MetStrp,fromLast=TRUE))
I get 120 duplicate locations. Which end up being the combination of the POSIXlt and POSIXct duplicates.
str(Dup_loc)
int [1:120] 62470 62471 62472 62473 62474 62475 62476 62477 62478 62479 ...
With the POSIXct dates always being from hours 1-2, and the POSIClt dates always being from hours 2-3
To see the duplicates:
Test<-MetStrp[Dup_loc]
>Test
[1] "2007-03-11 01:00:00" "2007-03-11 01:05:00" "2007-03-11 01:10:00"
[4] "2007-03-11 01:15:00" "2007-03-11 01:20:00" "2007-03-11 01:25:00"
[7] "2007-03-11 01:30:00" "2007-03-11 01:35:00" "2007-03-11 01:40:00"
[10] "2007-03-11 01:45:00" "2007-03-11 01:50:00" "2007-03-11 01:55:00"
[13] "2007-03-11 02:00:00" "2007-03-11 02:05:00" "2007-03-11 02:10:00"
[16] "2007-03-11 02:15:00" "2007-03-11 02:20:00" "2007-03-11 02:25:00"
[19] "2007-03-11 02:30:00" "2007-03-11 02:35:00" "2007-03-11 02:40:00"
[22] "2007-03-11 02:45:00" "2007-03-11 02:50:00" "2007-03-11 02:55:00"
[25] "2008-03-09 01:00:00" "2008-03-09 01:05:00" "2008-03-09 01:10:00"
[28] "2008-03-09 01:15:00" "2008-03-09 01:20:00" "2008-03-09 01:25:00"
[31] "2008-03-09 01:30:00" "2008-03-09 01:35:00" "2008-03-09 01:40:00"
[34] "2008-03-09 01:45:00" "2008-03-09 01:50:00" "2008-03-09 01:55:00"
[37] "2008-03-09 02:00:00" "2008-03-09 02:05:00" "2008-03-09 02:10:00"
[40] "2008-03-09 02:15:00" "2008-03-09 02:20:00" "2008-03-09 02:25:00"
[43] "2008-03-09 02:30:00" "2008-03-09 02:35:00" "2008-03-09 02:40:00"
[46] "2008-03-09 02:45:00" "2008-03-09 02:50:00" "2008-03-09 02:55:00"
[49] "2009-03-08 01:00:00" "2009-03-08 01:05:00" "2009-03-08 01:10:00"
[52] "2009-03-08 01:15:00" "2009-03-08 01:20:00" "2009-03-08 01:25:00"
[55] "2009-03-08 01:30:00" "2009-03-08 01:35:00" "2009-03-08 01:40:00"
[58] "2009-03-08 01:45:00" "2009-03-08 01:50:00" "2009-03-08 01:55:00"
[61] "2009-03-08 02:00:00" "2009-03-08 02:05:00" "2009-03-08 02:10:00"
[64] "2009-03-08 02:15:00" "2009-03-08 02:20:00" "2009-03-08 02:25:00"
[67] "2009-03-08 02:30:00" "2009-03-08 02:35:00" "2009-03-08 02:40:00"
[70] "2009-03-08 02:45:00" "2009-03-08 02:50:00" "2009-03-08 02:55:00"
[73] "2010-03-14 01:00:00" "2010-03-14 01:05:00" "2010-03-14 01:10:00"
[76] "2010-03-14 01:15:00" "2010-03-14 01:20:00" "2010-03-14 01:25:00"
[79] "2010-03-14 01:30:00" "2010-03-14 01:35:00" "2010-03-14 01:40:00"
[82] "2010-03-14 01:45:00" "2010-03-14 01:50:00" "2010-03-14 01:55:00"
[85] "2010-03-14 02:00:00" "2010-03-14 02:05:00" "2010-03-14 02:10:00"
[88] "2010-03-14 02:15:00" "2010-03-14 02:20:00" "2010-03-14 02:25:00"
[91] "2010-03-14 02:30:00" "2010-03-14 02:35:00" "2010-03-14 02:40:00"
[94] "2010-03-14 02:45:00" "2010-03-14 02:50:00" "2010-03-14 02:55:00"
[97] "2011-03-13 01:00:00" "2011-03-13 01:05:00" "2011-03-13 01:10:00"
[100] "2011-03-13 01:15:00" "2011-03-13 01:20:00" "2011-03-13 01:25:00"
[103] "2011-03-13 01:30:00" "2011-03-13 01:35:00" "2011-03-13 01:40:00"
[106] "2011-03-13 01:45:00" "2011-03-13 01:50:00" "2011-03-13 01:55:00"
[109] "2011-03-13 02:00:00" "2011-03-13 02:05:00" "2011-03-13 02:10:00"
[112] "2011-03-13 02:15:00" "2011-03-13 02:20:00" "2011-03-13 02:25:00"
[115] "2011-03-13 02:30:00" "2011-03-13 02:35:00" "2011-03-13 02:40:00"
[118] "2011-03-13 02:45:00" "2011-03-13 02:50:00" "2011-03-13 02:55:00"
As far as I can see, I don't see any duplicate time stamps above. So I'm not sure what is the matter, but something is amiss.
And as far as I can tell, all I have done is transformed a factor data set into a time based data set. So I have no idea why I am getting a duplicate error in zoo, and finding duplicates using duplicated
when there does not appear to be any.
Again, any thoughts on this matter would be greatly appreciated.
Upvotes: 0
Views: 251
Reputation: 263332
I have three words for you: " daylight savings time". I predict on the basis of the evidence offered that in your locale that March 11 2007 was the date when the Daylight Savings Time shift occurred. Notice that they occur in the time frame of 1-2 AM.
Upvotes: 2