Reputation: 49
I have a dataframe CLIENTS with: Hotel number Nationality Date of arrive Date of departure Something like:
Client Nationality Hotel Dateofarrive DateofDeparture
Cl1 es h1 21/07/2019 24/07/2019
Cl2 es h1 23/07/2019 24/07/2019
Cl3 es h1 06/07/2019 10/07/2019
Cl4 es h2 05/07/2019 06/07/2019
Cl5 fr h3 01/07/2019 02/07/2019
Cl6 pt h1 07/07/2019 09/07/2019
And i want a dataframe Hotel and for each hotel of the dataframe i want all the clients that are of spanish nationality and all the nights they have sleept in the hotel, the same for french nationality and portuguese. Something like:
Hotel CliEspan Nights CliFrench Night CliPortug Night
H1 3 8 0 0 1 2
H2 1 5 1 1 0 0
Upvotes: 2
Views: 79
Reputation: 28675
Here is a data.table option using dcast to cast the data to wide format.
library(data.table)
setDT(df)
# convert to date and calculate nights
df[, Dateofarrive := as.Date(Dateofarrive, format = '%d/%m/%Y')]
df[, DateofDeparture := as.Date(DateofDeparture, format = '%d/%m/%Y')]
df[, nights := as.numeric(DateofDeparture - Dateofarrive)]
# dcast to wide format
new <- dcast(df, Hotel ~ Nationality, value.var = 'nights',
fun.aggregate = list(sum, length))
names(new) <- gsub('nights_length', 'clients', names(new))
new
# Hotel nights_sum_es nights_sum_fr nights_sum_pt clients_es clients_fr clients_pt
# 1: h1 8 0 2 3 0 1
# 2: h2 1 0 0 1 0 0
# 3: h3 0 1 0 0 1 0
Upvotes: 0
Reputation: 13309
Not sure why we have 5 nights for hotel h2
but another dplyr
possibility(can then spread
:
df %>%
group_by(Client,Hotel,Nationality) %>%
mutate_at(vars(contains("Date")),list(~lubridate::dmy(.))) %>%
summarise(Time = DateofDeparture- Dateofarrive) %>%
ungroup() %>%
group_by(Hotel, Nationality) %>%
mutate(Nights= as.numeric(sum(Time))) %>%
tidyr::spread(Hotel, Nights, fill=0) %>%
rename_at(vars(contains("h")),list(~paste0(.,"_nights")))
# A tibble: 6 x 6
# Groups: Nationality [3]
Client Nationality Time h1_nights h2_nights h3_nights
<chr> <chr> <drtn> <dbl> <dbl> <dbl>
1 Cl1 es 3 days 8 0 0
2 Cl2 es 1 days 8 0 0
3 Cl3 es 4 days 8 0 0
4 Cl4 es 1 days 0 1 0
5 Cl5 fr 1 days 0 0 1
6 Cl6 pt 2 days 2 0 0
Upvotes: 1
Reputation: 388962
An option using dplyr
and tidyr
. We convert Dateofarrive
and DateofDeparture
to actual Date object then calculate the difference in by subtracting two dates, group_by
Hotel
and Nationality
and sum
total number of days and total number of entries in each group. Convert to long format, combine columns and spread
it to wide format by filling missing values to 0.
library(dplyr)
library(tidyr)
df %>%
mutate_at(vars(Dateofarrive, DateofDeparture), as.Date, "%d/%m/%Y") %>%
mutate(days = as.integer(DateofDeparture - Dateofarrive)) %>%
group_by(Hotel, Nationality) %>%
summarise(total = sum(days),
n = n()) %>%
gather(key, value, total, n) %>%
unite(col, Nationality, key, sep = "_") %>%
spread(col, value, fill = 0)
Upvotes: 2