alvaropr
alvaropr

Reputation: 799

Find date elements in Dataframe Column that matches specific condition in R

I am trying to classify instances in a dataframe by Holidays or Normal.

I have the dates that must be categorized as Holidays in a list/dataframe object and dates that I want to categorize in another test object.

To be classified as Holidays, besides being between the dates that must be categorized in this way, there is a Condition in one of the list/daraframe columns that must be 1 instead of 0 (i.e. an instance with a date that is actually between Holidays dates should be labelled as Normal if and only if the corresponding Condition is 0).

Object containing database of days that should be labelled as Holidays:

holidays2015 <- list(list("2015-01-01",1,1,1),
                     list("2015-01-06",0,1,1),
                     list("2015-03-19",0,1,1),
                     list("2015-04-02",0,1,1),
                     list("2015-04-03",0,1,1),
                     list("2015-05-01",1,1,1),
                     list("2015-05-02",0,1,1),
                     list("2015-05-15",0,1,1),
                     list("2015-06-04",0,1,1),
                     list("2015-08-15",1,1,0),
                     list("2015-10-12",1,1,1),
                     list("2015-11-09",0,1,1),
                     list("2015-12-08",1,1,0),
                     list("2015-12-24",0,0,1),
                     list("2015-12-25",1,1,0),
                     list("2015-12-31",0,0,1))

holidays2014 <- list(list("2014-01-01",1,1,1),
                     list("2014-01-06",0,1,1),
                     list("2014-04-17",0,1,1),
                     list("2014-04-18",0,1,1),
                     list("2014-05-01",1,1,1),
                     list("2014-05-02",0,1,0),
                     list("2014-05-15",0,1,1),
                     list("2014-06-19",0,1,1),
                     list("2014-08-15",1,1,1),
                     list("2014-11-01",1,1,0),
                     list("2014-11-10",0,1,1),
                     list("2014-12-06",1,1,1),
                     list("2014-12-08",1,1,0),
                     list("2014-12-25",1,1,1))
totalholidays <- list(holidays2015, holidays2014)
dfholidays <- lapply(totalholidays, function(x) data.table::rbindlist(x))
dfholidays <- data.table::rbindlist(dfholidays)
names(dfholidays) <- c("Date", "V2", "V3", "Condition")

Dates that I want to label:

mytestingdates <- as.data.frame(list("Date" = c("2014-01-07", "2014-08-15", 
"2015-06-04", "2015-08-15")))

My working solution is a for bucle:

SlowWay

holidaysvector <- c()
for (ii in 1:nrow(mytestingdates)){
  if (mytestingdates$Date[ii] %in% dfholidays$Date){
    tmp <- which(dfholidays$Date == mytestingdates$Date[ii])
    if (dfholidays$Condition[tmp] == 1) {
      holidaysvector <- c(holidaysvector, "Holidays")
    } else { holidaysvector <- c(holidaysvector, "Normal T.1") }
    } else { holidaysvector <- c(holidaysvector, "Normal T.2") }
}
mytestingdates$forsolution <- holidaysvector
rm(tmp)

But I would like a more efficient solutions. I have tried some R options but failed:

R desired looking alike solution:

mytestingdates$MyRtry <- ifelse(mytestingdates$Date %in% dfholidays$Date, 
ifelse(dfholidays$Condition == 1, "Holiday", "Normal T.1"), "Normal T.2")

Desired solution

        Date     MyRtry forsolution
1 2014-01-07 Normal T.2  Normal T.2
2 2014-08-15    Holiday    Holidays
3 2015-06-04    Holiday    Holidays
4 2015-08-15    Holiday  Normal T.1

Note that instance no.4 is in the Holidays object but its condition is 0 so it is labbeled as Normal day, this is missed in my R solution.

Any Idea? Any advice in clean code or programming techniques derived from my code will be very much appreaciated.

Upvotes: 0

Views: 370

Answers (2)

otwtm
otwtm

Reputation: 1999

This solution does not distinguish between NormalT1 and NormalT2, but it is quite simple:

mytestingdates["classifier"] <- ifelse(mytestingdates$Date %in% dfholidays[dfholidays$Condition==1]$Date,"Holiday", "Normal")

mytestingdates

        Date classifier
1 2014-01-07     Normal
2 2014-08-15    Holiday
3 2015-06-04    Holiday
4 2015-08-15     Normal

Upvotes: 1

Lennyy
Lennyy

Reputation: 6132

Are you open to a dplyr solution?

library(dplyr)
mytestingdates %>% 
  left_join(dfholidays) %>% 
  mutate(forsolution = ifelse(is.na(Condition), "Normal T.2", ifelse(Condition == 0, "Normal T.1", "Holidays"))) 

In here dfholidays joins mytestingdates. If a date in mytestingdates is not in dfholidays, it merges only NAs for those dates. So then you can check whether Condition is NA, and if so, you set forsolution to "Normal T.2". Subsequently, you check whether Condition == 0, and if so, you let forsolution be "Normal T.1". In the other cases forsolution will be "Holidays".

        Date V2 V3 Condition forsolution
1 2014-01-07 NA NA        NA  Normal T.2
2 2014-08-15  1  1         1    Holidays
3 2015-06-04  0  1         1    Holidays
4 2015-08-15  1  1         0  Normal T.1

UPDATE: shorter would be:

mytestingdates %>% 
  left_join(dfholidays) %>% 
  mutate(forsolution = case_when(is.na(Condition) ~ "Normal T.2", Condition == 0 ~ "Normal T.1",  TRUE ~ "Holidays"))

Upvotes: 1

Related Questions