Reputation: 5738
I have some survey data leading to a 5-point likert scale. However, in some response columns, some factors are missing. Here is the data:
Increased student engagement ,Instructional time effectiveness increased,Increased student confidence,Increased student performance in class assignments,Increased learning of the students,Added unique learning activities
Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree
Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree
Disagree,Strongly disagree,Neither agree nor disagree,Disagree,Disagree,Neither agree nor disagree
As you can see, that some response columns have some missing factors, e.g. in first column, Agree, and Strongly disagree are missing (for simplicity, I have pasted a subset of the actual data set)
I am using the following code in R:
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
likertData <- likert(facultyData, nlevels = 5)
plot(likertData)
However, this is leading to the following error:
Error in mean(as.numeric(items[, i]), na.rm = TRUE) :
(list) object cannot be coerced to type 'double'
I have tried the solution mentioned over other posts(the one in the commented line of code facultyData[] <- lapply(facultyData[], factor, levels=1:5)
), but it doesn't work either
Apparently, before executing this lappy the data contains:
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 Strongly agree
2 Agree
3 Agree
4 Agree
5 Agree
6 Agree
7 Agree
8 Agree
9 Agree
10 Neither agree nor disagree
11 Neither agree nor disagree
12 Neither agree nor disagree
13 Neither agree nor disagree
14 Disagree
After executing it data is overriden with NA values? Why is this happening?
> facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
> facultyData[,1]
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
13 NA
14 NA
After changing the code as follows, data is retained (doesn't become NA, yet I get the same error)
mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=mylevels)
This solution doesn't work for me - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R
Upvotes: 1
Views: 2240
Reputation: 3314
I created an Excel file with your sample data. Reading this in with read_excel
gives a result as follows
library(readxl)
dat <- read_excel("factor_labels.xlsx")
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <chr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <chr>, `Increased student confidence` <chr>, `Increased
#> # student performance in class assignments` <chr>, `Increased learning
#> # of the students` <chr>, `Added unique learning activities` <chr>
You are right that read_excel
does not convert character variables to factors - this is deliberate, as it is often unnecessary or inappropriate to treat character variables as categorical. Even when we do want to convert to factor it is good practice to do this explicitly to ensure the factors have the right levels, in the right order (by default the factor will be created with the levels present in the variable, sorted alphabetically). Sometimes we might want to do more complicated things like rename levels or regroup levels, but here we don't want to change the levels, merely specify the full set of levels. One way to create the required factors is with mutate_all
from dplyr
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- dat %>% mutate_all(factor, levels = mylevels)
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <fctr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <fctr>, `Increased student confidence` <fctr>, `Increased
#> # student performance in class assignments` <fctr>, `Increased learning
#> # of the students` <fctr>, `Added unique learning activities` <fctr>
lapply(dat, levels)
#> $`Increased student engagement`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Instructional time effectiveness increased`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student confidence`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student performance in class assignments`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased learning of the students`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Added unique learning activities`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
Note the change from <chr>
to <fctr>
in the printout. Compare this to the read.csv
solution:
facultyData <- read.csv("factor_labels.csv")
lapply(facultyData, levels)
#> $Increased.student.engagement
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Instructional.time.effectiveness.increased
#> [1] "Neither agree nor disagree" "Strongly agree"
#> [3] "Strongly disagree"
#>
#> $Increased.student.confidence
#> [1] "Neither agree nor disagree" "Strongly agree"
#>
#> $Increased.student.performance.in.class.assignments
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Increased.learning.of.the.students
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Added.unique.learning.activities
#> [1] "Neither agree nor disagree" "Strongly agree"
Since variables in the subset don't contain all levels, the number of levels varies and the levels aren't always in a logical order, which would need to be fixed. This is a common source of error/frustration further down the line!
Upvotes: 2
Reputation: 3194
Rewriting your data was no fun, and this took a bit to figure out but I think this will help you. Someone may have a shorter way. Let me know if it helps.
df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"),
c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"),
c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree"))
df <- as.data.frame(df)
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities")
lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'))
df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels)))
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels))
str(df.new)
'data.frame': 3 obs. of 6 variables:
$ Increased.student.engagement : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Instructional.time.effectiveness.increased : Factor w/ 5 levels "Strongly disagree",..: 5 3 1
$ Increased.student.confidence : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
$ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Increased.learning.of.the.students : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Added.unique.learning.activities : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
Upvotes: 2