AnnaIC
AnnaIC

Reputation: 49

mix data in very different formats

I have two dataframes. In the first one I have the hotel opening periods: "1" below the month means that the hotel is open, "0" that the hotel is closed.

hotels <- data.frame(
hotel = c (1:5), jan = c(1,1,1,0,0), feb = c(1,1,1,1,1), 
mar = c(1,1,1,1,1), ap = c(0,0,1,1,1), may = c(0,0,0,0,0),
jun = c(0,0,0,0,0), jul = c(0,0,0,1,1), aug = c(1,1,1,1,1),
sep = c(1,1,1,0,0), oct = c(0,0,0,1,1), nov = c(1,1,1,1,1),
des = c(1,1,1,1,1)
)

The second dataframe belongs to the clients, where we have the arrival and departure of said clients to the hotels.

clientes<-data.frame(
  id_client = c("a","b","c","d", "e", "f", "g", "h", "i", "j"), 
  hotel = c(1,2,3,4,5,1,2,3,4,5),
  arrive = c("2019-01-08", "2019-05-04", "2019-06-25", "2019-11-24", "2019-03-04", "2019-06-01", "2019-05-04","2019-08-13", "2019-04-06", "2019-07-17"),
  departure = c("2019-01-10", "2019-05-08", "2019-07-05", "2019-12-01", "2019-03-08", "2019-06-09", "2019-05-10","2019-08-20", "2019-04-10", "2019-08-3")
  )

We want to know which clients have told the truth and which have not. If the dates do not match the overture period of the hotel, the client has not been honest. Therefore we will want a dataframe like this answer:

results<-data.frame(
  id_client = c("a","b","c","d", "e", "f", "g", "h", "i", "j"), 
  hotel = c(1,2,3,4,5,1,2,3,4,5),
  arrive = c("2019-01-08", "2019-05-04", "2019-06-25", "2019-11-24", "2019-03-04", "2019-06-01", "2019-05-04","2019-08-13", "2019-04-06", "2019-07-17"),
  honest = c(TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE)
  )

Upvotes: 1

Views: 31

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388862

One way using mapply would be to extract the month departure and arrive columns. Pass this with hotel id to mapply, subset the relevant hotel from this info and check if the values have 1 for all the months client stayed in hotel.

clientes$departure <- as.Date(clientes$departure)
clientes$arrive <- as.Date(clientes$arrive)


clientes$honest <- mapply(function(x, y, z) 
 all(hotels[hotels$hotel == x, c(y, z)] == 1), clientes$hotel, 
    format(clientes$arrive, "%b"), format(clientes$departure, "%b"))


clientes
#   id_client hotel     arrive  departure honest
#1          a     1 2019-01-08 2019-01-10   TRUE
#2          b     2 2019-05-04 2019-05-08  FALSE
#3          c     3 2019-06-25 2019-07-05  FALSE
#4          d     4 2019-11-24 2019-12-01   TRUE
#5          e     5 2019-03-04 2019-03-08   TRUE
#6          f     1 2019-06-01 2019-06-09  FALSE
#7          g     2 2019-05-04 2019-05-10  FALSE
#8          h     3 2019-08-13 2019-08-20   TRUE
#9          i     4 2019-04-06 2019-04-10   TRUE
#10         j     5 2019-07-17 2019-08-03   TRUE

For this solution to work you need to do some changes to your dataset. The column names in your hotel dataset should be same as month.abb on your system so that they can be matched. I changed them manually here.

data

hotels <- data.frame(
  hotel = c (1:5), Jan = c(1,1,1,0,0), Feb = c(1,1,1,1,1), 
  Mar = c(1,1,1,1,1), Apr = c(0,0,1,1,1), May = c(0,0,0,0,0),
  Jun = c(0,0,0,0,0), Jul = c(0,0,0,1,1), Aug = c(1,1,1,1,1),
  Sep = c(1,1,1,0,0), Oct = c(0,0,0,1,1), Nov = c(1,1,1,1,1),
  Dec = c(1,1,1,1,1))

clientes<-data.frame(
id_client = c("a","b","c","d", "e", "f", "g", "h", "i", "j"), 
 hotel = c(1,2,3,4,5,1,2,3,4,5),
  arrive = c("2019-01-08", "2019-05-04", "2019-06-25", "2019-11-24", "2019-03-04", 
  "2019-06-01", "2019-05-04","2019-08-13", "2019-04-06", "2019-07-17"),
departure = c("2019-01-10", "2019-05-08", "2019-07-05", "2019-12-01","2019-03-08", 
  "2019-06-09", "2019-05-10","2019-08-20", "2019-04-10", "2019-08-03"))

Upvotes: 1

Related Questions