Reputation: 49
I would like assistance on how I can manipulate, describe, summarize and visualize Likert questions on R.
Here is the dataset I am using: https://docs.google.com/spreadsheets/d/1Kje8K4Ow_Io4wdMikntO1vB-g12fLzJK5fPEBYRhIFE/edit?usp=sharing
The likert questions are on a scale of 1 - 5, where, 1= Strongly Disagree, 2 = Disagree, 3 = Moderately Agree, 4 = Agree and 5 = Strongly agree
From the data, I am interested in Columns 11, 12, 13 and 14
I would like to summarize, column 11, 12, 13 and 14, with total count, and percentage for each of the scale. Calculate the sum total, mean and standard deviation for each column.
Here is an example of the expected data ouput: Expected output
Create a Likert plot for the data
I am struggling to output the data, specifically the descriptive statistics
A step by step guide would really help.
Upvotes: 3
Views: 379
Reputation: 33812
I read your data into R using googlesheets4
:
library(googlesheets4)
dataset <- read_sheet("1Kje8K4Ow_Io4wdMikntO1vB-g12fLzJK5fPEBYRhIFE")
We can generate a table somewhat like your example by using dplyr
and tidyr
to select the columns, pivot the data to a long form, and then group on the items to perform the summary calculations.
We use weighted.mean
for the mean and wtd.var
from the Hmisc
package to get the weighted standard deviation.
library(dplyr)
library(tidyr)
dataset_sum <- dataset %>%
select(11:14) %>%
pivot_longer(everything()) %>%
group_by(name, value) %>%
summarise(Count = n()) %>%
group_by(name) %>%
mutate(`%` = 100 * (Count / sum(Count)),
wMean = weighted.mean(value, Count),
wSD = sqrt(Hmisc::wtd.var(value, Count)),
Total = sum(Count)) %>%
ungroup() %>%
pivot_wider(names_from = "value",
names_sep = " ",
values_from = c("Count", "%"),
names_vary = "slowest")
Result:
# A tibble: 4 × 14
name wMean wSD Total `Count 1` `% 1` `Count 2` `% 2` `Count 3` `% 3` `Count 4` `% 4` `Count 5` `% 5`
<chr> <dbl> <dbl> <int> <int> <dbl> <int> <dbl> <int> <dbl> <int> <dbl> <int> <dbl>
1 Data from ministries or affiliated government agencies is easily accessible onli… 3.63 1.08 180 8 4.44 18 10 48 26.7 65 36.1 41 22.8
2 Data from ministries or government affiliated agencies in Rwanda is publicly ava… 3.68 1.07 180 8 4.44 15 8.33 47 26.1 67 37.2 43 23.9
3 Datasets available on Rwanda's open data platforms are free for anyone to access… 3.36 1.26 180 20 11.1 23 12.8 48 26.7 50 27.8 39 21.7
4 Datasets on Rwanda's open data platforms cover all your areas of service provisi… 3.24 1.10 180 15 8.33 21 11.7 74 41.1 45 25 25 13.9
For Likert analysis we can use the likert
package.
library(likert)
First we need to convert the four columns to factor variables. Important: we need a data frame not a tibble for likert
to work:
dataset_f <- dataset %>%
select(11:14) %>%
mutate(across(everything(), ~factor(.x, ordered = TRUE, levels = as.character(1:5)))) %>%
as.data.frame()
dataset_lik <- likert(dataset_f)
The summary
function gives us something similar to the previous summarization:
summary(dataset_lik)
Item low neutral high mean sd
1 Data from ministries or government affiliated agencies in Rwanda is publicly available online or in digital formats 12.8 26.1 61.1 3.68 1.07
2 Data from ministries or affiliated government agencies is easily accessible online or in digital formats and quick to find and use 14.4 26.7 58.9 3.63 1.08
3 Datasets available on Rwanda's open data platforms are free for anyone to access, use and share it 23.9 26.7 49.4 3.36 1.26
4 Datasets on Rwanda's open data platforms cover all your areas of service provision/mandate 20 41.1 38.9 3.24 1.10
And we can also plot the likert object:
plot(dataset_lik)
Upvotes: 4
Reputation: 582
I tried to do the first few parts of this as a learning exercise, so with the caveat that I am a beginner also, hope this might help a bit ..
library(tidyverse)
I created a toy dataset from the first 10 rows of your data.
df <- data.frame(
Familiarity = c(3, 5, 2, 4, 2, 3, 5, 2, 3, 4),
Accessibility = c(3, 5, 3, 4, 2, 3, 4, 2, 2, 4),
EaseOfUse = c(3, 4, 2, 3, 2, 3, 3, 2, 2, 4),
ReleaseSystematic = c(4, 5, 3, 4, 2, 4, 4, 4, 4, 4))
Then gave each of the variables factor levels.
df2 <- df %>%
mutate(Familiarity = factor(Familiarity, levels = c(1:5)),
Accessibility = factor(Accessibility, levels = c(1:5)),
EaseOfUse = factor(EaseOfUse, levels = c(1:5)),
ReleaseSystematic = factor(ReleaseSystematic, levels = c(1:5)))
Then created tables with the summary counts for each variable. I looked around but couldn't find/understand a simple way to do this?
table_familiarity <- (table(df2$Familiarity))
table_accessibility <- (table(df2$Accessibility))
table_ease <- (table(df2$EaseOfUse))
table_release <- (table(df2$ReleaseSystematic))
df3 <- addmargins(rbind(table_familiarity, table_accessibility, table_ease, table_release))
This is the table of counts:
df4 <- as.data.frame(df3) %>%
select(-Sum) %>%
filter(row_number() != 5)
And here is the table of proportions:
proportions <- df4 %>%
as.matrix() %>%
prop.table(margin = 1) * 100
prop_table <- as.data.frame(proportions)
Hopefully someone else may be able to help with the other parts of your question and I am interested to read better approaches.
Upvotes: 0