Pivot_wider() function in tidyr

Question

I am trying to understand the working of the pivot_wider function in tidyr. I am having bookings data and properties data and I am trying to find whether properties appeal to business travelers and tourists alike

The steps that I am trying to accomplish is by:

First, converting the column for_business to a factor with the levels "business" and "tourist".
For each property and for business travelers and tourists separately, calculating the average review score.
Then, calculate the average review score difference between business travelers and tourists.

Code:

bookings %>%
  mutate(for_business = factor(for_business, labels = c("business", "tourist"))) %>%
  select(property_id, for_business) %>%
  mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
  mutate(diff = business - tourist) %>%
  summarise(avg_diff = mean(diff, na.rm = TRUE))

Upon this I am facing the error:

Error: Problem with `mutate()` input `avg_review_score`. x object 'review_score' not found i Input `avg_review_score` is `mean(review_score, na.rm = TRUE)`.

> dput(head(bookings))
structure(list(booker_id = c("215934017ba98c09f30dedd29237b43dad5c7b5f", 
"7f590fd6d318248a48665f7f7db529aca40c84f5", "10f0f138e8bb1015d3928f2b7d828cbb50cd0804", 
"7b55021a4160dde65e31963fa55a096535bcad17", "6694a79d158c7818cd63831b71bac91286db5aff", 
"d0358740d5f15e85523f94ab8219f25d8c017347"), property_id = c(2668, 
4656, 4563, 4088, 2188, 4171), room_nights = c(4, 5, 6, 7, 4, 
2), price_per_night = c(91.4669561442773, 106.504997616816, 86.9913739625713, 
92.3656155139053, 104.838941902747, 109.981876495045), checkin_day = c("mon", 
"tue", "wed", "fri", "tue", "fri"), for_business = c(FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), status = c("cancelled", "cancelled", 
"stayed", "stayed", "stayed", "cancelled"), review_score = c(NA, 
NA, 6.25812265672399, 5.953597754693, 6.43474489539585, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

> dput(head(properties))
structure(list(property_id = c(2668, 4656, 4563, 4088, 2188, 
4171), destination = c("Brisbane", "Brisbane", "Brisbane", "Brisbane", 
"Brisbane", "Brisbane"), property_type = c("Hotel", "Hotel", 
"Apartment", "Apartment", "Apartment", "Apartment"), nr_rooms = c(32, 
39, 9, 9, 4, 5), facilities = c("airport shuttle,free wifi,garden,breakfast,pool,on-site restaurant", 
"on-site restaurant,pool,airport shuttle,breakfast,bbq,free wifi,spa", 
"laundry", "kitchen,laundry,free wifi", "parking,kitchen,bbq,free wifi,game console", 
"kitchen,pool,laundry,parking,free wifi,garden")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

akrun · Accepted Answer

The error is based on the select step where we are selecting only two columns while the next mutate step requires a column that is not present in the selected dataset. Instead it would be better to include that column as well in the select

bookings %>%
  mutate(for_business = factor(for_business, levels = c(FALSE, TRUE), 
      labels = c("business", "tourist"))) %>%
 select(property_id, for_business, review_score) %>%
  mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
  mutate(diff = business - tourist) %>%
  summarise(avg_diff = mean(diff, na.rm = TRUE))

Pivot_wider() function in tidyr

Answers (1)

Related Questions