Reputation: 349
Background: For my thesis i have several hundred large CSV files. These files contain time series with 23800 forecast half-hourly periods between the 01.11.2016 and 01.03.2018 of weather parameters. Further, each period has 20 or 21 runtimes (number is not fixed, which is basically the problem in my case). The runtimes mark the time for a certain forecast period when the forecast is calculated. Thus, a forecast is mostly calculated before the forecast time (which naturally makes sense) but, for whatever reason this is not always the case. For some periods (mostly but not always) between 09:00 am and 12:00am there is one runtime for each period which is calculated in the future. I don't want to have this "future" runtime (I can't make a sense of it why it is included)
Exemplary excerpt of the data:
+-----------------------+-----------------------+-------------+
| ForecastPeriod | Runtime | Value |
+-----------------------+-----------------------+-------------+
| … | … | … |
| 02.11.2016 11:30+0000 | 31.10.2016 00:00+0000 | 5.544368776 |
| 02.11.2016 11:30+0000 | 31.10.2016 12:00+0000 | 4.71684533 |
| 02.11.2016 11:30+0000 | 01.11.2016 00:00+0000 | 5.374274986 |
| 02.11.2016 11:30+0000 | 01.11.2016 12:00+0000 | 5.892114875 |
| 02.11.2016 11:30+0000 | 02.11.2016 00:00+0000 | 6.18387462 | <-i want this row
| 02.11.2016 11:30+0000 | 02.11.2016 12:00+0000 | 5.852306909 | <- don't make sense
| 02.11.2016 12:00+0000 | 23.10.2016 12:00+0000 | 14.81608444 |
| 02.11.2016 12:00+0000 | 24.10.2016 00:00+0000 | 3.637574565 |
| … | … | ... |
| 02.11.2016 12:00+0000 | 01.11.2016 12:00+0000 | 5.541325144 |
| 02.11.2016 12:00+0000 | 02.11.2016 00:00+0000 | 5.745831136 | <- i want this row
| 02.11.2016 12:00+0000 | 02.11.2016 12:00+0000 | 5.347949883 | <- don't make sense
| 02.11.2016 12:30+0000 | 24.10.2016 00:00+0000 | 3.80366064 |
| 02.11.2016 12:30+0000 | 24.10.2016 12:00+0000 | 5.533042696 |
| … | … | … |
| 02.11.2016 12:30+0000 | 01.11.2016 12:00+0000 | 5.429153394 |
| 02.11.2016 12:30+0000 | 02.11.2016 00:00+0000 | 5.580232543 |
| 02.11.2016 12:30+0000 | 02.11.2016 12:00+0000 | 5.266140403 | <- i want this row
| 02.11.2016 13:00+0000 | 24.10.2016 00:00+0000 | 3.969746715 | <- here is no "future" runtime
| 02.11.2016 13:00+0000 | 24.10.2016 12:00+0000 | 5.704328337 |
| … | … | … |
+-----------------------+-----------------------+-------------+
Now my working solution: What i am doing now is, to loop through the large data frame and filter the data which meets my expectations. It works, but it is very slow on my laptop. (takes almost an hour to go through 500.000 rows), i have a plethora of csv files to go through... I ask myself, if there are any possibilities to do it faster? I am also ok with additional R packages if they work faster. Besides i am thinking of uploading the data to a faster SQL Server; is processing such a date comparison task faster on SQL?
#Some preliminary transformations for the comparable posixct format:
LA_Date_EC$Forecast.Time<-as.POSIXlt(LA_Date_EC$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
LA_Date_EC$Runtime.Forecast<-as.POSIXlt(LA_Date_EC$Runtime.Forecast,format="%d.%m.%Y %H:%M+%S",tz="UTC")
test.df2<-data.frame()
names(test.df2)<-names(LA_Date_EC) ##info: Datetimes
for (l in 2:469501){
if(LA_Date_EC[l,1]!=LA_Date_EC[l+1,1]){
#print(l)
if(LA_Date_EC[l,2]>=LA_Date_EC[l,1]){
test.df2<-rbind.data.frame(test.df2,LA_Date_EC[l-1,])
}else{
test.df2<-rbind.data.frame(test.df2,LA_Date_EC[l,])
}
}
}
EDIT: sample excerpt as an dput output in R:
structure(list(Forecast.Time = structure(list(sec = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 30L, 30L, 30L, 30L), hour = c(10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L), mday = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mon = c(10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L
), year = c(116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L), wday = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
yday = c(306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt"), tzone = "UTC"), Runtime.Forecast = structure(list(
sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L), mday = c(27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L,
31L, 31L, 1L, 1L, 2L, 2L, 23L, 24L, 24L, 25L, 25L, 26L, 26L,
27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L,
2L, 2L, 23L, 24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L,
28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L, 2L, 2L, 23L, 24L,
24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L,
30L, 31L, 31L, 1L, 1L, 2L, 2L, 24L, 24L, 25L, 25L, 26L, 26L,
27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L,
2L, 2L, 24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L,
29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L, 2L, 2L, 24L, 24L, 25L,
25L), mon = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L,
10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L), year = c(116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L
), wday = c(4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L,
3L, 3L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L,
0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 0L, 1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 0L,
1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L,
1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L,
1L, 1L, 2L, 2L), yday = c(300L, 300L, 301L, 301L, 302L, 302L,
303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L, 296L, 297L,
297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L, 301L, 302L,
302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L, 296L,
297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L, 301L,
302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L,
296L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L), isdst = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = "UTC"), Wind.Speed = c(12.0889469204481,
8.1534483762018, 11.229031832199, 9.51623004872928, 7.99700924410322,
8.06420698869646, 7.46190421726437, 7.95691440205356, 8.19089703425263,
7.50023772800533, 7.46471349405832, 7.87503218264228, 8.10704533381368,
8.25997087655172, 12.9641999142878, 5.95739166070848, 8.48709144265445,
12.3686489749888, 3.27377438788927, 3.8132283355639, 5.40513611081943,
12.3699466614361, 7.91484229489558, 11.0188269744693, 9.56301437212706,
7.91921747636113, 7.86903553214633, 7.4208161449472, 7.7049451673898,
8.02618971449148, 7.32764074016071, 7.26610021373866, 7.70408467708526,
7.90262085370489, 8.1065215773556, 13.5223226992998, 6.23422083753905,
8.45254447734072, 12.3148462884236, 3.00095427388565, 4.11192170847009,
5.35120820775642, 12.6509464024241, 7.67623621358937, 10.8086221167397,
9.60979869552483, 7.84142570861904, 7.67386407559621, 7.37972807263003,
7.45297593272604, 7.86148239473033, 7.15504375231609, 7.06748693341901,
7.53313717152826, 7.69819637359609, 7.95307227815948, 14.0804454843119,
6.51105001436962, 8.41799751202699, 12.2610436018583, 2.72813415988203,
4.41061508137628, 5.2972803046934, 12.9319461434121, 7.43763013228316,
10.59841725901, 9.65658301892261, 7.76363394087695, 7.47869261904608,
7.33864000031286, 7.20100669806228, 7.69677507496918, 6.98244676447147,
6.86887365309936, 7.36218966597125, 7.4937718934873, 7.79962297896336,
6.47610742140243, 8.44809907428991, 12.1754295175459, 2.80289574128868,
4.43071887689015, 5.25901387681356, 12.8048270636345, 7.77257529660677,
10.6689406707837, 9.67371178278272, 7.71232463800448, 7.639287068313,
7.37678432847625, 7.39386920787284, 7.65056355621861, 7.0961073828294,
6.94340806177623, 7.41655132109855, 7.53010844435008, 7.89628470472931,
6.44116482843524, 8.47820063655284, 12.0898154332335, 2.87765732269532,
4.450822672404, 5.22074744893371, 12.6777079838569, 8.10752046093037,
10.7394640825575, 9.69084054664283, 7.66101533513201, 7.79988151757991,
7.41492865663964, 7.58673171768341, 7.60435203746804, 7.20976800118733,
7.01794247045311, 7.47091297622585, 7.56644499521287, 7.99294643049527,
6.40622223546805, 8.50830219881576, 12.0042013489211, 2.95241890410197
)), .Names = c("Forecast.Time", "Runtime.Forecast", "Wind.Speed"
), row.names = 1400:1520, class = "data.frame")
Upvotes: 0
Views: 327
Reputation: 24079
Here is a potential solution using the dplyr package. With the lead/lag function and group_by the loop was eliminated.
As I mentioned in my comment above I converted the date/times to POSIXct objects.
library(dplyr)
#df is a copy of the orginal data
df<-LA_Date_EC
#find all future values and remove them from the data
future<-LA_Date_EC[,2]>=lag(LA_Date_EC[,1])
future[1]<-FALSE
df<-df[!future,]
#Group by the Forecast time and then find the last row
answer<-df %>% group_by(Forecast.Time) %>%
summarize(Runtime.Forecas= last(Runtime.Forecast), Wind.Speed = last(Wind.Speed))
Upvotes: 1
Reputation: 1106
You could do this with dplyr
or data.table
. data.table
should be the quicker solution for you.
dplyr
library(dplyr)
df$Forecast.Time <- as.POSIXct(df$Forecast.Time)
df$Runtime.Forecast <- as.POSIXct(df$Runtime.Forecast)
filtered <- df %>% filter(Forecast.Time > Runtime.Forecast) %>%
group_by(Forecast.Time) %>%
summarise_all(funs(last))
data.table
library(data.table)
df_dt <- as.data.table(df)
filtered_dt <- dat_dt[Forecast.Time > Runtime.Forecast, lapply(.SD, last), by = Forecast.Time]
Upvotes: 2