Reputation: 49
I'm analyzing a dataset for my master thesis. The data come from a survey I created. I'm trying to google anything I can but being the dataset not sparse enough I'm having some lengths problems (for example, some item have values ranging from 1-7, some others only from 2-6. the scale used is a 7 point likert scale but if, for example, an item didn't get 7 at least once, it will be of a different length compared to a more sparse one)(Problem 1).
structure(list(AD_BORING_1 = c("3", "2", "4", "1", "6", "3",
"7", "6", "2", "3", "5", "4", "6", "5", "5", "6", "5", "2", "2",
"6", "2", "3", "5", "4", "5", "5", "1", "2", "4", "2", "3", "6",
"5", "5", "3"), AD_IRRITATING_1 = c("3", "2", "2", "1", "7",
"5", "6", "4", "5", "5", "1", "5", "4", "3", "5", "6", "5", "2",
"2", "4", "5", "3", "2", "4", "3", "4", "1", "2", "4", "5", "4",
"4", "7", "4", "2"), AD_DISTURBING_1 = c("3", "1", "3", "3",
"4", "1", "3", "2", "2", "4", "1", "3", "4", "2", "1", "4", "2",
"2", "2", "4", "1", "5", "1", "2", "2", "2", "1", "2", "4", "2",
"4", "2", "4", "6", "2"), AD_CREDIBLE_1 = c("5", "5", "3", "2",
"1", "2", "6", "3", "6", "3", "5", "4", "2", "3", "4", "1", "5",
"3", "3", "2", "1", "3", "5", "3", "2", "4", "6", "6", "3", "1",
"5", "6", "2", "3", "5"), AD_GOOD_1 = c("5", "5", "3", "2", "2",
"5", "3", "4", "5", "2", "5", "2", "1", "5", "4", "2", "2", "5",
"5", "2", "3", "5", "4", "4", "4", "4", "6", "4", "3", "2", "4",
"4", "1", "4", "5"), AD_HONEST_1 = c("5", "3", "3", "2", "2",
"1", "4", "3", "5", "2", "6", "1", "2", "2", "3", "2", "4", "3",
"2", "2", "2", "3", "2", "4", "1", "3", "4", "3", "2", "2", "3",
"5", "1", "4", "3"), AD_TRUTHFUL_1 = c("5", "3", "4", "2", "2",
"1", "5", "3", "5", "2", "5", "2", "2", "3", "3", "2", "5", "3",
"2", "1", "2", "2", "4", "5", "1", "3", "4", "4", "4", "1", "2",
"3", "1", "1", "3"), AD_LIKEABLE_1 = c("5", "4", "3", "2", "2",
"6", "2", "4", "5", "4", "4", "3", "3", "4", "3", "4", "5", "6",
"7", "1", "2", "2", "2", "4", "1", "3", "6", "6", "2", "4", "1",
"4", "1", "3", "5"), AD_ENJOYABLE_1 = c("5", "5", "3", "2", "2",
"4", "2", "4", "5", "4", "5", "3", "2", "6", "3", "2", "5", "6",
"7", "2", "2", "2", "4", "5", "2", "3", "7", "6", "3", "4", "4",
"3", "1", "3", "4"), LIKE_1 = c("6", "5", "3", "2", "1", "4",
"2", "3", "5", "3", "4", "3", "1", "4", "3", "3", "5", "5", "7",
"1", "4", "5", "4", "4", "2", "4", "6", "6", "4", "3", "4", "4",
"1", "2", "5")), row.names = c(NA, -35L), class = c("tbl_df",
"tbl", "data.frame"))
The rows of the main dataset are just the n. of observations and every item score is in the columns.
Another problem is I have no idea how to properly plot them all together to be compared in a simple barplot like for example the picture below:
I tried with items of the same length using this code:
prova <- data.frame(table(A_DF_GIL$AD_BORING_1), table(A_DF_GIL$AD_IRRITATING_1))
barplot(as.matrix(prova))
but still the result is not the one I need. Can anybody help me please? Thank youu
Upvotes: 0
Views: 133
Reputation: 5254
Here's an updated response with your more full dataset now that I understand your goal better.
In processing the data I used a renaming function with a regex
to clean up the names but this is optional. I also converted the scores to a factor
so it's easy to treat them as ordinal discrete data (which they are) rather than continuous. However I convert back to continuous data in the bottom example to calculate a mean()
which is one option for decrowding the plot.
Given the large volume, I opted for a stacked bar plot using geom_bar(position = "stack")
, but try "dodge"
to see for yourself.
Also I commented out the line to facet by LIKE
but you should try it out to see if that is more informative.
I also applied some aesthetics that I subjectively like but mostly to demonstrate that there's a lot of control in {ggplot2} that you can customize.
I applied likert scale labels to the color scale which you can customize if that's helpful or just omit by dropping the labels = likert_scale
.
library(tidyverse)
d <- structure(list(AD_BORING_1 = c("3", "2", "4", "1", "6", "3", "7", "6", "2", "3", "5", "4", "6", "5", "5", "6", "5", "2", "2", "6", "2", "3", "5", "4", "5", "5", "1", "2", "4", "2", "3", "6", "5", "5", "3"), AD_IRRITATING_1 = c("3", "2", "2", "1", "7", "5", "6", "4", "5", "5", "1", "5", "4", "3", "5", "6", "5", "2", "2", "4", "5", "3", "2", "4", "3", "4", "1", "2", "4", "5", "4", "4", "7", "4", "2"), AD_DISTURBING_1 = c("3", "1", "3", "3", "4", "1", "3", "2", "2", "4", "1", "3", "4", "2", "1", "4", "2", "2", "2", "4", "1", "5", "1", "2", "2", "2", "1", "2", "4", "2", "4", "2", "4", "6", "2"), AD_CREDIBLE_1 = c("5", "5", "3", "2", "1", "2", "6", "3", "6", "3", "5", "4", "2", "3", "4", "1", "5", "3", "3", "2", "1", "3", "5", "3", "2", "4", "6", "6", "3", "1", "5", "6", "2", "3", "5"), AD_GOOD_1 = c("5", "5", "3", "2", "2", "5", "3", "4", "5", "2", "5", "2", "1", "5", "4", "2", "2", "5", "5", "2", "3", "5", "4", "4", "4", "4", "6", "4", "3", "2", "4", "4", "1", "4", "5"), AD_HONEST_1 = c("5", "3", "3", "2", "2", "1", "4", "3", "5", "2", "6", "1", "2", "2", "3", "2", "4", "3", "2", "2", "2", "3", "2", "4", "1", "3", "4", "3", "2", "2", "3", "5", "1", "4", "3"), AD_TRUTHFUL_1 = c("5", "3", "4", "2", "2", "1", "5", "3", "5", "2", "5", "2", "2", "3", "3", "2", "5", "3", "2", "1", "2", "2", "4", "5", "1", "3", "4", "4", "4", "1", "2", "3", "1", "1", "3"), AD_LIKEABLE_1 = c("5", "4", "3", "2", "2", "6", "2", "4", "5", "4", "4", "3", "3", "4", "3", "4", "5", "6", "7", "1", "2", "2", "2", "4", "1", "3", "6", "6", "2", "4", "1", "4", "1", "3", "5"), AD_ENJOYABLE_1 = c("5", "5", "3", "2", "2", "4", "2", "4", "5", "4", "5", "3", "2", "6", "3", "2", "5", "6", "7", "2", "2", "2", "4", "5", "2", "3", "7", "6", "3", "4", "4", "3", "1", "3", "4"), LIKE_1 = c("6", "5", "3", "2", "1", "4", "2", "3", "5", "3", "4", "3", "1", "4", "3", "3", "5", "5", "7", "1", "4", "5", "4", "4", "2", "4", "6", "6", "4", "3", "4", "4", "1", "2", "5")), row.names = c(NA, -35L), class = c("tbl_df", "tbl", "data.frame"))
# make labels in case that helps
likert_breaks <- c("Strongly Disagree", "Somewhat Disagree", "Slightly Disagree", "Neutral", "Slightly Agree", "Somewhat Agree", "Strongly Agree")
# process and plot as stacked bar plot with optional faceting
d %>%
rename_with(~str_extract(.x, "(?<=_)(.*)(?=_)"), contains("AD")) %>%
rename(LIKE = LIKE_1) %>%
mutate(across(everything(), as.integer)) %>%
mutate(respondent = row_number()) %>%
pivot_longer(-c(respondent, LIKE), names_to = "adjective", values_to = "likert") %>%
mutate(likert = factor(likert)) %>%
ggplot(aes(adjective, fill = likert)) +
geom_bar(stat = "count", position = "stack") +
# facet_wrap(~LIKE) +
scale_fill_viridis_d(option = "A", begin = 0, end = 0.85, labels = likert_breaks) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
Another option to keep things more compact is just to show the mean score for each adjective. In this case you need to leave it as a numeric
so you can apply a summarizing function like mean()
.
# just show mean likert scores for each
d %>%
rename_with(~str_extract(.x, "(?<=_)(.*)(?=_)"), contains("AD")) %>%
rename(LIKE = LIKE_1) %>%
mutate(across(everything(), as.integer)) %>%
mutate(respondent = row_number()) %>%
pivot_longer(-c(respondent, LIKE), names_to = "adjective", values_to = "likert") %>%
group_by(adjective) %>%
summarise(likert = mean(likert)) %>%
ggplot(aes(reorder(adjective, -likert), likert)) +
geom_col() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
Created on 2022-02-08 by the reprex package (v2.0.1)
Upvotes: 1