Reputation: 818
I am trying to understand the working of the pivot_wider
function in tidyr
. I am having bookings
data and properties data and I am trying to find whether properties appeal to business travelers and tourists alike
The steps that I am trying to accomplish is by:
for_business
to a factor with the levels "business"
and "tourist"
.Code:
bookings %>%
mutate(for_business = factor(for_business, labels = c("business", "tourist"))) %>%
select(property_id, for_business) %>%
mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
mutate(diff = business - tourist) %>%
summarise(avg_diff = mean(diff, na.rm = TRUE))
Upon this I am facing the error:
Error: Problem with `mutate()` input `avg_review_score`. x object 'review_score' not found i Input `avg_review_score` is `mean(review_score, na.rm = TRUE)`.
> dput(head(bookings))
structure(list(booker_id = c("215934017ba98c09f30dedd29237b43dad5c7b5f",
"7f590fd6d318248a48665f7f7db529aca40c84f5", "10f0f138e8bb1015d3928f2b7d828cbb50cd0804",
"7b55021a4160dde65e31963fa55a096535bcad17", "6694a79d158c7818cd63831b71bac91286db5aff",
"d0358740d5f15e85523f94ab8219f25d8c017347"), property_id = c(2668,
4656, 4563, 4088, 2188, 4171), room_nights = c(4, 5, 6, 7, 4,
2), price_per_night = c(91.4669561442773, 106.504997616816, 86.9913739625713,
92.3656155139053, 104.838941902747, 109.981876495045), checkin_day = c("mon",
"tue", "wed", "fri", "tue", "fri"), for_business = c(FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE), status = c("cancelled", "cancelled",
"stayed", "stayed", "stayed", "cancelled"), review_score = c(NA,
NA, 6.25812265672399, 5.953597754693, 6.43474489539585, NA)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(properties))
structure(list(property_id = c(2668, 4656, 4563, 4088, 2188,
4171), destination = c("Brisbane", "Brisbane", "Brisbane", "Brisbane",
"Brisbane", "Brisbane"), property_type = c("Hotel", "Hotel",
"Apartment", "Apartment", "Apartment", "Apartment"), nr_rooms = c(32,
39, 9, 9, 4, 5), facilities = c("airport shuttle,free wifi,garden,breakfast,pool,on-site restaurant",
"on-site restaurant,pool,airport shuttle,breakfast,bbq,free wifi,spa",
"laundry", "kitchen,laundry,free wifi", "parking,kitchen,bbq,free wifi,game console",
"kitchen,pool,laundry,parking,free wifi,garden")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 165
Reputation: 887811
The error is based on the select
step where we are selecting only two columns while the next mutate
step requires a column that is not present in the selected dataset. Instead it would be better to include that column as well in the select
bookings %>%
mutate(for_business = factor(for_business, levels = c(FALSE, TRUE),
labels = c("business", "tourist"))) %>%
select(property_id, for_business, review_score) %>%
mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
mutate(diff = business - tourist) %>%
summarise(avg_diff = mean(diff, na.rm = TRUE))
Upvotes: 2