Reputation: 1105
I want to pull my hair out on this one...
I read in code like this
holidays <- read.csv("~/xxx/holiday_sample.csv") %>%
rename(DATE = "ï..DATE") %>%
mutate(DATE = as.Date(DATE,format = "%m/%d/%Y"))
##looks like this
structure(list(DATE = structure(c(17532, 17533, 17534, 17546,
17547, 17548, 17549, 17575, 17576, 17577, 17620, 17621, 17622,
17678, 17679, 17680, 17681, 17682, 17713, 17714, 17715, 17716,
17717, 17774, 17775, 17776, 17777, 17778, 17812, 17847, 17855,
17856, 17857, 17858, 17859, 17860, 17884, 17885, 17886, 17887,
17888, 17889, 17890, 17891, 17892, 17893, 17894, 17895, 17896
), class = "Date"), REASON = c("New Years Day", "New Years Travel",
"New Years Travel", "Lee-Jackson Day", "Lee-Jackson-King Travel Day",
"Lee-Jackson-King Travel Day", "Martin Luther King, Jr. Day",
"Presidents Day Travel", "Presidents Day Travel", "Presidents Day",
"Easter Travel", "Easter Travel", "Easter", "Memorial Day Travel",
"Memorial Day Travel", "Memorial Day Travel", "Memorial Day",
"Memorial Day Travel", "Independence Day Travel", "Independence Day Travel",
"Independence Day Travel", "Independence Day", "Independence Day Travel",
"Labor Day Travel", "Labor Day Travel", "Labor Day Travel", "Labor Day",
"Labor Day Travel", "Columbus Day", "Veterans Day", "Thanksgiving Travel",
"Thanksgiving Travel", "Thanksgiving Day", "Thanksgiving Travel",
"Thanksgiving Travel", "Thanksgiving Travel", "Christmas Travel",
"Christmas Travel", "Christmas Travel", "Christmas Travel", "Christmas Travel",
"Christmas Travel", "Christmas Day", "Christmas Travel", "Christmas Travel",
"Christmas Travel", "Christmas Travel", "Christmas Travel", "New Years Travel"
)), class = "data.frame", row.names = c(NA, -49L))
And I want to loop thru another df to see which rows happen on a holiday.
bottleneck2 <- structure(list(startTime = structure(c(1519903920, 1519905060,
1519913640), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
endTime = structure(c(1519904880, 1519912200, 1519914540), class = c("POSIXct",
"POSIXt"), tzone = "America/New_York"), impact = c(92.17,
616.43, 63.69), impactPercent = c(184.15, 1495.17, 138.69
), impactSpeedDiff = c(3587.72, 25726.22, 2616.01), maxQueueLength = c(5.76053,
5.76053, 4.829511), tmcs = list(c("110N04623", "110-04623",
"110N04624", "110-04624", "110N04625", "110-04625", "110N04626",
"110-04626", "110N04627"), c("110N04623", "110-04623", "110N04624",
"110-04624", "110N04625", "110-04625", "110N04626", "110-04626",
"110N04627"), c("110N04623", "110-04623", "110N04624", "110-04624",
"110N04625", "110-04625", "110N04626", "110-04626")), early_startTime = structure(c(1519903620,
1519904760, 1519913340), class = c("POSIXct", "POSIXt"), tzone = "America/New_York")), row.names = c(NA,
3L), class = "data.frame")
But when I run the following I get a syntax error which makes zero sense....
holiday_match <- lapply(1:nrow(bottleneck2), function(x) {
bottleneck_row <- bottleneck2[x,]
holidays[which(holidays$DATE = as.Date(bottleneck_row$early_startTime) |
holidays$DATE = as.Date(bottleneck_row$endTime) == TRUE),]
})
ERROR: Error: unexpected '}' in " }"
And then when I am saving the file in R I get another error.
Error in source("~/xxx/example.R") :
~/xxx/example.R:226:32: unexpected '='
225: bottleneck_row <- bottleneck2[x,]
226: holidays[which(holidays$DATE =
Saw another post saying it could be a Unicode mismatch but retyped it twice and no shot. This is a copy and paste of another loop in the file which works perfectly....
Upvotes: 1
Views: 160
Reputation: 160447
I think the operation you are effectively trying to do is determine if one of the bottleneck2
occurrences happen on a holiday. I think a better operation is a merge/join operation. Since you are looking at two fields, I think we need two joins, but I don't think this will be expensive, and we can cleanup afterwards so it just doesn't matter.
For this example, none of your bottleneck2
occurrences happen on a holiday, so I'm going to "nudge" two of them to happen on different holidays ...
bottleneck2 %>%
# just to "bump" a couple of the rows into a holiday occurrence,
# purely for demonstration
mutate_if(~ inherits(., "POSIXt"),
~ . + c(0, 29, 31) * 86400) %>%
# add a "_date" column for each so that we can "join" on the
# date-version of each timestamp
mutate_at(vars(early_startTime, endTime),
list(date = ~ trunc(as.Date(.)))) %>%
left_join(holidays, by = c(early_startTime_date = "DATE")) %>%
left_join(holidays, by = c(endTime_date = "DATE")) %>%
mutate(REASON = coalesce(REASON.x, REASON.y)) %>%
select(-REASON.x, -REASON.y, -ends_with("_date"))
# startTime endTime impact impactPercent impactSpeedDiff maxQueueLength tmcs early_startTime REASON
# 1 2018-03-01 06:32:00 2018-03-01 06:48:00 92.17 184.15 3587.72 5.760530 110N04623, 110-04623, 110N04624, 110-04624, 110N04625, 110-04625, 110N04626, 110-04626, 110N04627 2018-03-01 06:27:00 <NA>
# 2 2018-03-30 07:51:00 2018-03-30 09:50:00 616.43 1495.17 25726.22 5.760530 110N04623, 110-04623, 110N04624, 110-04624, 110N04625, 110-04625, 110N04626, 110-04626, 110N04627 2018-03-30 07:46:00 Easter Travel
# 3 2018-04-01 10:14:00 2018-04-01 10:29:00 63.69 138.69 2616.01 4.829511 110N04623, 110-04623, 110N04624, 110-04624, 110N04625, 110-04625, 110N04626, 110-04626 2018-04-01 10:09:00 Easter
Now you have a REASON
field (far right) that is the holiday name or NA
otherwise.
From here, if you need to know which bottleneck2
match a holiday, just use filter(!is.na(REASON))
and you have all matching bottlenecks.
To answer your question as to why the syntax is incorrect, see this (after fixing =
to ==
):
holiday_match <- lapply(1:nrow(bottleneck2), function(x) {
bottleneck_row <- bottleneck2[x,]
holidays[which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime) == TRUE),]
})
Let's drill inside:
holidays[which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime) == TRUE),]
Specifically,
which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime) == TRUE)
Let's remove the first half of the |
:
which(holidays$DATE == as.Date(bottleneck_row$endTime) == TRUE)
# ...
holidays$DATE == as.Date(bottleneck_row$endTime) == TRUE
Unlike math operators (e.g., +
) and assignment (<-
), the ==
does not *cascade:
TRUE == TRUE == TRUE
# Error: unexpected '==' in "TRUE == TRUE =="
(TRUE == TRUE) == TRUE
# [1] TRUE
So a literal fix would be
holiday_match <- lapply(1:nrow(bottleneck2), function(x) {
bottleneck_row <- bottleneck2[x,]
holidays[which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime)) == TRUE,]
})
but since == TRUE
is completely unnecessary, this can be reduced to
holiday_match <- lapply(1:nrow(bottleneck2), function(x) {
bottleneck_row <- bottleneck2[x,]
holidays[which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime)),]
})
holiday_match
# [[1]]
# [1] DATE REASON
# <0 rows> (or 0-length row.names)
# [[2]]
# [1] DATE REASON
# <0 rows> (or 0-length row.names)
# [[3]]
# [1] DATE REASON
# <0 rows> (or 0-length row.names)
no matches because your sample dataset has no overlaps. If you use my "nudged" data above, then
holiday_match <- lapply(1:nrow(bottleneck2mod), function(x) {
bottleneck_row <- bottleneck2mod[x,]
holidays[which(holidays$DATE == as.Date(bottleneck_row$early_startTime) |
holidays$DATE == as.Date(bottleneck_row$endTime)),]
})
holiday_match
# [[1]]
# [1] DATE REASON
# <0 rows> (or 0-length row.names)
# [[2]]
# DATE REASON
# 11 2018-03-30 Easter Travel
# [[3]]
# DATE REASON
# 13 2018-04-01 Easter
Upvotes: 1