Sas
Sas

Reputation: 37

Testing and density plot across multiple columns

I have a list of restaurants and their star rating:

Restaurant     Question               1.star  2.stars ...etc

McDonalds      How was the food?      5       6       ...
McDonalds      How were the drinks?   3       4
McDonalds      How were the workers?  2       7
Burger_King    How was the food?      4       11
Burger_King    How were the drinks?   9       3
Burger_King    How were the workers?  12      1

1. How do I perform a t-test to determine whether people only use the 1-star and 5-star ratings?

2. How do I graph a density distribution of the star ratings?

3. In general, how do you graph across multiple columns, e.g. col_1 has value, col_2 has frequency?

tribble for convenience:

tribble(
  ~restaurant, ~question,  ~one_star, ~two_star, ~three_star, ~four_star, ~five_star, ~average,

  "McDonalds", "How was the food?",  5, 6, 8, 2, 9, (5*1 + 6*2 + 8*3 + 2*4 + 5*9)/(5 + 6 + 8 + 2 + 9),
  "McDonalds", "How were the drinks?",  9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
  "McDonalds", "How were the drinks?",  9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
  "BurgerKing", "How was the food?",  2, 6, 8, 2, 9, (2*1 + 6*2 + 8*3 + 2*4 + 5*9)/(2 + 6 + 8 + 2 + 9),
  "BurgerKing", "How were the drinks?",  1, 4, 8, 5, 1, (1*1 + 4*2 + 8*3 + 5*4 + 5*1)/(1 + 4 + 8 + 5 + 1),
  "BurgerKing", "How were the drinks?",  4, 7, 2, 5, 1, (4*1 + 7*2 + 2*3 + 5*4 + 5*1)/(4 + 7 + 2 + 5 + 1)
)

Edit: As requested, here is my attempt:

#Note: this only works because it truncates the rest of the dataframe. Unaware of alternatives
#Step 1: Transform from wide to long
ratingdf <-  
  df %>%
  select(one_star:five_star) %>%
  pivot_longer(one_star:five_star, names_to = "rating")

#Step 2: Collapse values into total frequency
ratingdf <- 
  ratingdf %>%
  group_by(rating) %>%
  summarize(sum(value)) 

#Graph using ggplot
ratingdf %>%
  ggplot(aes(x = rating, y = `sum(value)`)) +
  geom_histogram(stat = "identity")

When I tried to use geom_density() on this, it does not show anything because the frequencies instead of the columns are given.

Upvotes: 0

Views: 74

Answers (1)

Georgery
Georgery

Reputation: 8117

Preparation

library(tidyverse)

df <- tribble(
    ~restaurant, ~question,  ~one_star, ~two_star, ~three_star, ~four_star, ~five_star, ~average,

    "McDonalds", "How was the food?",  5, 6, 8, 2, 9, (5*1 + 6*2 + 8*3 + 2*4 + 5*9)/(5 + 6 + 8 + 2 + 9),
    "McDonalds", "How were the drinks?",  9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
    "McDonalds", "How were the drinks?",  9, 8, 7, 5, 1, (9*1 + 8*2 + 7*3 + 5*4 + 5*1)/(9 + 8 + 7 + 5 + 1),
    "BurgerKing", "How was the food?",  2, 6, 8, 2, 9, (2*1 + 6*2 + 8*3 + 2*4 + 5*9)/(2 + 6 + 8 + 2 + 9),
    "BurgerKing", "How were the drinks?",  1, 4, 8, 5, 1, (1*1 + 4*2 + 8*3 + 5*4 + 5*1)/(1 + 4 + 8 + 5 + 1),
    "BurgerKing", "How were the drinks?",  4, 7, 2, 5, 1, (4*1 + 7*2 + 2*3 + 5*4 + 5*1)/(4 + 7 + 2 + 5 + 1)
)

df <- df %>%
    pivot_longer(cols = ends_with("_star")
                 , names_to = "stars")

Question 1

oneORfive <- df %>%
    mutate(oneORfive = as.numeric(stars == "one_star" | stars == "five_star")) %>%
    pull(oneORfive)

occurences <- df %>%
    pull(value)

allVotes_oneORfive <- unlist(map2(oneORfive, occurences, rep_len))
t.test(allVotes_oneORfive)

I doubt that a t-test is the right thing to do here, but I won't argue, since this is not a statistics forum.

Question 2

A density plot does not make sense here, because it's not a continuous scale. Also, the levels are not equi-distant (IMO) - otherwise you wouldn't have to ask question 1. Either way, maybe the following histogram can help you as well:

df %>%
    mutate(stars = factor(stars, levels = c("one_star", "two_star", "three_star", "four_star", "five_star"))) %>%
    ggplot(aes(stars, value)) +
    geom_bar(stat = "identity") +
    facet_grid(restaurant ~ question)

enter image description here

Question 3 I don't get this one. Maybe it makes sense to open a separate question about it.

Upvotes: 2

Related Questions