I need to barplot and compare different items of a survey (columns in the dataset)

I'm analyzing a dataset for my master thesis. The data come from a survey I created. I'm trying to google anything I can but being the dataset not sparse enough I'm having some lengths problems (for example, some item have values ranging from 1-7, some others only from 2-6. the scale used is a 7 point likert scale but if, for example, an item didn't get 7 at least once, it will be of a different length compared to a more sparse one)(Problem 1).

structure(list(AD_BORING_1 = c("3", "2", "4", "1", "6", "3", 
"7", "6", "2", "3", "5", "4", "6", "5", "5", "6", "5", "2", "2", 
"6", "2", "3", "5", "4", "5", "5", "1", "2", "4", "2", "3", "6", 
"5", "5", "3"), AD_IRRITATING_1 = c("3", "2", "2", "1", "7", 
"5", "6", "4", "5", "5", "1", "5", "4", "3", "5", "6", "5", "2", 
"2", "4", "5", "3", "2", "4", "3", "4", "1", "2", "4", "5", "4", 
"4", "7", "4", "2"), AD_DISTURBING_1 = c("3", "1", "3", "3", 
"4", "1", "3", "2", "2", "4", "1", "3", "4", "2", "1", "4", "2", 
"2", "2", "4", "1", "5", "1", "2", "2", "2", "1", "2", "4", "2", 
"4", "2", "4", "6", "2"), AD_CREDIBLE_1 = c("5", "5", "3", "2", 
"1", "2", "6", "3", "6", "3", "5", "4", "2", "3", "4", "1", "5", 
"3", "3", "2", "1", "3", "5", "3", "2", "4", "6", "6", "3", "1", 
"5", "6", "2", "3", "5"), AD_GOOD_1 = c("5", "5", "3", "2", "2", 
"5", "3", "4", "5", "2", "5", "2", "1", "5", "4", "2", "2", "5", 
"5", "2", "3", "5", "4", "4", "4", "4", "6", "4", "3", "2", "4", 
"4", "1", "4", "5"), AD_HONEST_1 = c("5", "3", "3", "2", "2", 
"1", "4", "3", "5", "2", "6", "1", "2", "2", "3", "2", "4", "3", 
"2", "2", "2", "3", "2", "4", "1", "3", "4", "3", "2", "2", "3", 
"5", "1", "4", "3"), AD_TRUTHFUL_1 = c("5", "3", "4", "2", "2", 
"1", "5", "3", "5", "2", "5", "2", "2", "3", "3", "2", "5", "3", 
"2", "1", "2", "2", "4", "5", "1", "3", "4", "4", "4", "1", "2", 
"3", "1", "1", "3"), AD_LIKEABLE_1 = c("5", "4", "3", "2", "2", 
"6", "2", "4", "5", "4", "4", "3", "3", "4", "3", "4", "5", "6", 
"7", "1", "2", "2", "2", "4", "1", "3", "6", "6", "2", "4", "1", 
"4", "1", "3", "5"), AD_ENJOYABLE_1 = c("5", "5", "3", "2", "2", 
"4", "2", "4", "5", "4", "5", "3", "2", "6", "3", "2", "5", "6", 
"7", "2", "2", "2", "4", "5", "2", "3", "7", "6", "3", "4", "4", 
"3", "1", "3", "4"), LIKE_1 = c("6", "5", "3", "2", "1", "4", 
"2", "3", "5", "3", "4", "3", "1", "4", "3", "3", "5", "5", "7", 
"1", "4", "5", "4", "4", "2", "4", "6", "6", "4", "3", "4", "4", 
"1", "2", "5")), row.names = c(NA, -35L), class = c("tbl_df", 
"tbl", "data.frame"))

The rows of the main dataset are just the n. of observations and every item score is in the columns.

Another problem is I have no idea how to properly plot them all together to be compared in a simple barplot like for example the picture below:

Example

I tried with items of the same length using this code:

prova <- data.frame(table(A_DF_GIL$AD_BORING_1), table(A_DF_GIL$AD_IRRITATING_1))
barplot(as.matrix(prova))

but still the result is not the one I need. Can anybody help me please? Thank youu

Upvotes: 0

Views: 133

Answers (1)

Dan Adams
Dan Adams

Reputation: 5254

Here's an updated response with your more full dataset now that I understand your goal better.

In processing the data I used a renaming function with a regex to clean up the names but this is optional. I also converted the scores to a factor so it's easy to treat them as ordinal discrete data (which they are) rather than continuous. However I convert back to continuous data in the bottom example to calculate a mean() which is one option for decrowding the plot.

Given the large volume, I opted for a stacked bar plot using geom_bar(position = "stack"), but try "dodge" to see for yourself.

Also I commented out the line to facet by LIKE but you should try it out to see if that is more informative.

I also applied some aesthetics that I subjectively like but mostly to demonstrate that there's a lot of control in {ggplot2} that you can customize.

I applied likert scale labels to the color scale which you can customize if that's helpful or just omit by dropping the labels = likert_scale.

library(tidyverse)

d <- structure(list(AD_BORING_1 = c("3", "2", "4", "1", "6", "3", "7", "6", "2", "3", "5", "4", "6", "5", "5", "6", "5", "2", "2", "6", "2", "3", "5", "4", "5", "5", "1", "2", "4", "2", "3", "6", "5", "5", "3"), AD_IRRITATING_1 = c("3", "2", "2", "1", "7", "5", "6", "4", "5", "5", "1", "5", "4", "3", "5", "6", "5", "2", "2", "4", "5", "3", "2", "4", "3", "4", "1", "2", "4", "5", "4", "4", "7", "4", "2"), AD_DISTURBING_1 = c("3", "1", "3", "3", "4", "1", "3", "2", "2", "4", "1", "3", "4", "2", "1", "4", "2", "2", "2", "4", "1", "5", "1", "2", "2", "2", "1", "2", "4", "2", "4", "2", "4", "6", "2"), AD_CREDIBLE_1 = c("5", "5", "3", "2", "1", "2", "6", "3", "6", "3", "5", "4", "2", "3", "4", "1", "5", "3", "3", "2", "1", "3", "5", "3", "2", "4", "6", "6", "3", "1", "5", "6", "2", "3", "5"), AD_GOOD_1 = c("5", "5", "3", "2", "2", "5", "3", "4", "5", "2", "5", "2", "1", "5", "4", "2", "2", "5", "5", "2", "3", "5", "4", "4", "4", "4", "6", "4", "3", "2", "4", "4", "1", "4", "5"), AD_HONEST_1 = c("5", "3", "3", "2", "2", "1", "4", "3", "5", "2", "6", "1", "2", "2", "3", "2", "4", "3", "2", "2", "2", "3", "2", "4", "1", "3", "4", "3", "2", "2", "3", "5", "1", "4", "3"), AD_TRUTHFUL_1 = c("5", "3", "4", "2", "2", "1", "5", "3", "5", "2", "5", "2", "2", "3", "3", "2", "5", "3", "2", "1", "2", "2", "4", "5", "1", "3", "4", "4", "4", "1", "2", "3", "1", "1", "3"), AD_LIKEABLE_1 = c("5", "4", "3", "2", "2", "6", "2", "4", "5", "4", "4", "3", "3", "4", "3", "4", "5", "6", "7", "1", "2", "2", "2", "4", "1", "3", "6", "6", "2", "4", "1", "4", "1", "3", "5"), AD_ENJOYABLE_1 = c("5", "5", "3", "2", "2", "4", "2", "4", "5", "4", "5", "3", "2", "6", "3", "2", "5", "6", "7", "2", "2", "2", "4", "5", "2", "3", "7", "6", "3", "4", "4", "3", "1", "3", "4"), LIKE_1 = c("6", "5", "3", "2", "1", "4", "2", "3", "5", "3", "4", "3", "1", "4", "3", "3", "5", "5", "7", "1", "4", "5", "4", "4", "2", "4", "6", "6", "4", "3", "4", "4", "1", "2", "5")), row.names = c(NA, -35L), class = c("tbl_df", "tbl", "data.frame"))

# make labels in case that helps
likert_breaks <- c("Strongly Disagree", "Somewhat Disagree", "Slightly Disagree", "Neutral", "Slightly Agree", "Somewhat Agree", "Strongly Agree")

# process and plot as stacked bar plot with optional faceting
d %>%
  rename_with(~str_extract(.x, "(?<=_)(.*)(?=_)"), contains("AD")) %>%
  rename(LIKE = LIKE_1) %>%
  mutate(across(everything(), as.integer)) %>%
  mutate(respondent = row_number()) %>%
  pivot_longer(-c(respondent, LIKE), names_to = "adjective", values_to = "likert") %>%
  mutate(likert = factor(likert)) %>%
  ggplot(aes(adjective, fill = likert)) +
  geom_bar(stat = "count", position = "stack") +
  # facet_wrap(~LIKE) +
  scale_fill_viridis_d(option = "A", begin = 0, end = 0.85, labels = likert_breaks) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90))

Another option to keep things more compact is just to show the mean score for each adjective. In this case you need to leave it as a numeric so you can apply a summarizing function like mean().

# just show mean likert scores for each
d %>%
  rename_with(~str_extract(.x, "(?<=_)(.*)(?=_)"), contains("AD")) %>%
  rename(LIKE = LIKE_1) %>%
  mutate(across(everything(), as.integer)) %>%
  mutate(respondent = row_number()) %>%
  pivot_longer(-c(respondent, LIKE), names_to = "adjective", values_to = "likert") %>%
  group_by(adjective) %>%
  summarise(likert = mean(likert)) %>%
  ggplot(aes(reorder(adjective, -likert), likert)) +
  geom_col() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90))

Created on 2022-02-08 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions