Reputation: 799
I am trying to classify instances in a dataframe by Holidays
or Normal
.
I have the dates that must be categorized as Holidays
in a list/dataframe
object and dates that I want to categorize in another test object.
To be classified as Holidays
, besides being between the dates that must be categorized in this way, there is a Condition
in one of the list/daraframe
columns that must be 1
instead of 0
(i.e. an instance with a date that is actually between Holidays
dates should be labelled as Normal
if and only if the corresponding Condition
is 0
).
Object containing database of days that should be labelled as Holidays
:
holidays2015 <- list(list("2015-01-01",1,1,1),
list("2015-01-06",0,1,1),
list("2015-03-19",0,1,1),
list("2015-04-02",0,1,1),
list("2015-04-03",0,1,1),
list("2015-05-01",1,1,1),
list("2015-05-02",0,1,1),
list("2015-05-15",0,1,1),
list("2015-06-04",0,1,1),
list("2015-08-15",1,1,0),
list("2015-10-12",1,1,1),
list("2015-11-09",0,1,1),
list("2015-12-08",1,1,0),
list("2015-12-24",0,0,1),
list("2015-12-25",1,1,0),
list("2015-12-31",0,0,1))
holidays2014 <- list(list("2014-01-01",1,1,1),
list("2014-01-06",0,1,1),
list("2014-04-17",0,1,1),
list("2014-04-18",0,1,1),
list("2014-05-01",1,1,1),
list("2014-05-02",0,1,0),
list("2014-05-15",0,1,1),
list("2014-06-19",0,1,1),
list("2014-08-15",1,1,1),
list("2014-11-01",1,1,0),
list("2014-11-10",0,1,1),
list("2014-12-06",1,1,1),
list("2014-12-08",1,1,0),
list("2014-12-25",1,1,1))
totalholidays <- list(holidays2015, holidays2014)
dfholidays <- lapply(totalholidays, function(x) data.table::rbindlist(x))
dfholidays <- data.table::rbindlist(dfholidays)
names(dfholidays) <- c("Date", "V2", "V3", "Condition")
Dates that I want to label:
mytestingdates <- as.data.frame(list("Date" = c("2014-01-07", "2014-08-15",
"2015-06-04", "2015-08-15")))
My working solution is a for bucle:
SlowWay
holidaysvector <- c()
for (ii in 1:nrow(mytestingdates)){
if (mytestingdates$Date[ii] %in% dfholidays$Date){
tmp <- which(dfholidays$Date == mytestingdates$Date[ii])
if (dfholidays$Condition[tmp] == 1) {
holidaysvector <- c(holidaysvector, "Holidays")
} else { holidaysvector <- c(holidaysvector, "Normal T.1") }
} else { holidaysvector <- c(holidaysvector, "Normal T.2") }
}
mytestingdates$forsolution <- holidaysvector
rm(tmp)
But I would like a more efficient solutions. I have tried some R options but failed:
R desired looking alike solution:
mytestingdates$MyRtry <- ifelse(mytestingdates$Date %in% dfholidays$Date,
ifelse(dfholidays$Condition == 1, "Holiday", "Normal T.1"), "Normal T.2")
Date MyRtry forsolution
1 2014-01-07 Normal T.2 Normal T.2
2 2014-08-15 Holiday Holidays
3 2015-06-04 Holiday Holidays
4 2015-08-15 Holiday Normal T.1
Note that instance no.4 is in the Holidays
object but its condition
is 0 so it is labbeled as Normal
day, this is missed in my R solution.
Any Idea? Any advice in clean code or programming techniques derived from my code will be very much appreaciated.
Upvotes: 0
Views: 370
Reputation: 1999
This solution does not distinguish between NormalT1 and NormalT2, but it is quite simple:
mytestingdates["classifier"] <- ifelse(mytestingdates$Date %in% dfholidays[dfholidays$Condition==1]$Date,"Holiday", "Normal")
mytestingdates
Date classifier
1 2014-01-07 Normal
2 2014-08-15 Holiday
3 2015-06-04 Holiday
4 2015-08-15 Normal
Upvotes: 1
Reputation: 6132
Are you open to a dplyr solution?
library(dplyr)
mytestingdates %>%
left_join(dfholidays) %>%
mutate(forsolution = ifelse(is.na(Condition), "Normal T.2", ifelse(Condition == 0, "Normal T.1", "Holidays")))
In here dfholidays joins mytestingdates. If a date in mytestingdates is not in dfholidays, it merges only NAs for those dates. So then you can check whether Condition is NA, and if so, you set forsolution to "Normal T.2". Subsequently, you check whether Condition == 0, and if so, you let forsolution be "Normal T.1". In the other cases forsolution will be "Holidays".
Date V2 V3 Condition forsolution
1 2014-01-07 NA NA NA Normal T.2
2 2014-08-15 1 1 1 Holidays
3 2015-06-04 0 1 1 Holidays
4 2015-08-15 1 1 0 Normal T.1
UPDATE: shorter would be:
mytestingdates %>%
left_join(dfholidays) %>%
mutate(forsolution = case_when(is.na(Condition) ~ "Normal T.2", Condition == 0 ~ "Normal T.1", TRUE ~ "Holidays"))
Upvotes: 1