Reputation: 37
I have a list of restaurants and their star rating:
Restaurant Question 1.star 2.stars ...etc
McDonalds How was the food? 5 6 ...
McDonalds How were the drinks? 3 4
McDonalds How were the workers? 2 7
Burger_King How was the food? 4 11
Burger_King How were the drinks? 9 3
Burger_King How were the workers? 12 1
1. How do I perform a t-test to determine whether people only use the 1-star and 5-star ratings?
2. How do I graph a density distribution of the star ratings?
3. In general, how do you graph across multiple columns, e.g. col_1 has value, col_2 has frequency?
tribble
for convenience:
tribble(
~restaurant, ~question, ~one_star, ~two_star, ~three_star, ~four_star, ~five_star, ~average,
"McDonalds", "How was the food?", 5, 6, 8, 2, 9, (5*1 + 6*2 + 8*3 + 2*4 + 5*9)/(5 + 6 + 8 + 2 + 9),
"McDonalds", "How were the drinks?", 9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
"McDonalds", "How were the drinks?", 9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
"BurgerKing", "How was the food?", 2, 6, 8, 2, 9, (2*1 + 6*2 + 8*3 + 2*4 + 5*9)/(2 + 6 + 8 + 2 + 9),
"BurgerKing", "How were the drinks?", 1, 4, 8, 5, 1, (1*1 + 4*2 + 8*3 + 5*4 + 5*1)/(1 + 4 + 8 + 5 + 1),
"BurgerKing", "How were the drinks?", 4, 7, 2, 5, 1, (4*1 + 7*2 + 2*3 + 5*4 + 5*1)/(4 + 7 + 2 + 5 + 1)
)
Edit: As requested, here is my attempt:
#Note: this only works because it truncates the rest of the dataframe. Unaware of alternatives
#Step 1: Transform from wide to long
ratingdf <-
df %>%
select(one_star:five_star) %>%
pivot_longer(one_star:five_star, names_to = "rating")
#Step 2: Collapse values into total frequency
ratingdf <-
ratingdf %>%
group_by(rating) %>%
summarize(sum(value))
#Graph using ggplot
ratingdf %>%
ggplot(aes(x = rating, y = `sum(value)`)) +
geom_histogram(stat = "identity")
When I tried to use geom_density()
on this, it does not show anything because the frequencies instead of the columns are given.
Upvotes: 0
Views: 74
Reputation: 8117
Preparation
library(tidyverse)
df <- tribble(
~restaurant, ~question, ~one_star, ~two_star, ~three_star, ~four_star, ~five_star, ~average,
"McDonalds", "How was the food?", 5, 6, 8, 2, 9, (5*1 + 6*2 + 8*3 + 2*4 + 5*9)/(5 + 6 + 8 + 2 + 9),
"McDonalds", "How were the drinks?", 9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
"McDonalds", "How were the drinks?", 9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
"BurgerKing", "How was the food?", 2, 6, 8, 2, 9, (2*1 + 6*2 + 8*3 + 2*4 + 5*9)/(2 + 6 + 8 + 2 + 9),
"BurgerKing", "How were the drinks?", 1, 4, 8, 5, 1, (1*1 + 4*2 + 8*3 + 5*4 + 5*1)/(1 + 4 + 8 + 5 + 1),
"BurgerKing", "How were the drinks?", 4, 7, 2, 5, 1, (4*1 + 7*2 + 2*3 + 5*4 + 5*1)/(4 + 7 + 2 + 5 + 1)
)
df <- df %>%
pivot_longer(cols = ends_with("_star")
, names_to = "stars")
Question 1
oneORfive <- df %>%
mutate(oneORfive = as.numeric(stars == "one_star" | stars == "five_star")) %>%
pull(oneORfive)
occurences <- df %>%
pull(value)
allVotes_oneORfive <- unlist(map2(oneORfive, occurences, rep_len))
t.test(allVotes_oneORfive)
I doubt that a t-test is the right thing to do here, but I won't argue, since this is not a statistics forum.
Question 2
A density plot does not make sense here, because it's not a continuous scale. Also, the levels are not equi-distant (IMO) - otherwise you wouldn't have to ask question 1. Either way, maybe the following histogram can help you as well:
df %>%
mutate(stars = factor(stars, levels = c("one_star", "two_star", "three_star", "four_star", "five_star"))) %>%
ggplot(aes(stars, value)) +
geom_bar(stat = "identity") +
facet_grid(restaurant ~ question)
Question 3 I don't get this one. Maybe it makes sense to open a separate question about it.
Upvotes: 2