Vipin Verma
Vipin Verma

Reputation: 5738

Likert in R with unequal number of factor levels

I have some survey data leading to a 5-point likert scale. However, in some response columns, some factors are missing. Here is the data:

Increased student engagement ,Instructional time effectiveness increased,Increased student confidence,Increased student performance in class assignments,Increased learning of the students,Added unique learning activities

Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree

Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree

Disagree,Strongly disagree,Neither agree nor disagree,Disagree,Disagree,Neither agree nor disagree

As you can see, that some response columns have some missing factors, e.g. in first column, Agree, and Strongly disagree are missing (for simplicity, I have pasted a subset of the actual data set)

I am using the following code in R:

facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
likertData <- likert(facultyData, nlevels = 5)
plot(likertData)

However, this is leading to the following error:

Error in mean(as.numeric(items[, i]), na.rm = TRUE) : 
  (list) object cannot be coerced to type 'double'

I have tried the solution mentioned over other posts(the one in the commented line of code facultyData[] <- lapply(facultyData[], factor, levels=1:5)), but it doesn't work either

Apparently, before executing this lappy the data contains:

# A tibble: 14 × 1
   `Increased student engagement`
                           <fctr>
1                  Strongly agree
2                           Agree
3                           Agree
4                           Agree
5                           Agree
6                           Agree
7                           Agree
8                           Agree
9                           Agree
10     Neither agree nor disagree
11     Neither agree nor disagree
12     Neither agree nor disagree
13     Neither agree nor disagree
14                       Disagree

After executing it data is overriden with NA values? Why is this happening?

> facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
> facultyData[,1]
# A tibble: 14 × 1
   `Increased student engagement`
                           <fctr>
1                              NA
2                              NA
3                              NA
4                              NA
5                              NA
6                              NA
7                              NA
8                              NA
9                              NA
10                             NA
11                             NA
12                             NA
13                             NA
14                             NA

After changing the code as follows, data is retained (doesn't become NA, yet I get the same error)

mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=mylevels)

This solution doesn't work for me - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R

Upvotes: 1

Views: 2240

Answers (2)

Heather Turner
Heather Turner

Reputation: 3314

I created an Excel file with your sample data. Reading this in with read_excel gives a result as follows

library(readxl)
dat <- read_excel("factor_labels.xlsx")
dat
#> # A tibble: 3 × 6
#>   `Increased student engagement`
#>                            <chr>
#> 1                 Strongly agree
#> 2     Neither agree nor disagree
#> 3                       Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> #   increased` <chr>, `Increased student confidence` <chr>, `Increased
#> #   student performance in class assignments` <chr>, `Increased learning
#> #   of the students` <chr>, `Added unique learning activities` <chr>

You are right that read_excel does not convert character variables to factors - this is deliberate, as it is often unnecessary or inappropriate to treat character variables as categorical. Even when we do want to convert to factor it is good practice to do this explicitly to ensure the factors have the right levels, in the right order (by default the factor will be created with the levels present in the variable, sorted alphabetically). Sometimes we might want to do more complicated things like rename levels or regroup levels, but here we don't want to change the levels, merely specify the full set of levels. One way to create the required factors is with mutate_all from dplyr

mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree", 
  "Agree", "Strongly agree")

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dat <- dat %>% mutate_all(factor, levels = mylevels)
dat
#> # A tibble: 3 × 6
#>   `Increased student engagement`
#>                           <fctr>
#> 1                 Strongly agree
#> 2     Neither agree nor disagree
#> 3                       Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> #   increased` <fctr>, `Increased student confidence` <fctr>, `Increased
#> #   student performance in class assignments` <fctr>, `Increased learning
#> #   of the students` <fctr>, `Added unique learning activities` <fctr>
lapply(dat, levels)
#> $`Increased student engagement`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Instructional time effectiveness increased`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased student confidence`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased student performance in class assignments`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased learning of the students`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Added unique learning activities`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"

Note the change from <chr> to <fctr> in the printout. Compare this to the read.csv solution:

facultyData <- read.csv("factor_labels.csv")
lapply(facultyData, levels)
#> $Increased.student.engagement
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Instructional.time.effectiveness.increased
#> [1] "Neither agree nor disagree" "Strongly agree"            
#> [3] "Strongly disagree"         
#> 
#> $Increased.student.confidence
#> [1] "Neither agree nor disagree" "Strongly agree"            
#> 
#> $Increased.student.performance.in.class.assignments
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Increased.learning.of.the.students
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Added.unique.learning.activities
#> [1] "Neither agree nor disagree" "Strongly agree"

Since variables in the subset don't contain all levels, the number of levels varies and the levels aren't always in a logical order, which would need to be fixed. This is a common source of error/frustration further down the line!

Upvotes: 2

Evan Friedland
Evan Friedland

Reputation: 3194

Rewriting your data was no fun, and this took a bit to figure out but I think this will help you. Someone may have a shorter way. Let me know if it helps.

df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"),
            c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"),
            c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree"))
df <- as.data.frame(df)
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities")

lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'))

df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels)))
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels))

str(df.new)
'data.frame':   3 obs. of  6 variables:
 $ Increased.student.engagement                      : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Instructional.time.effectiveness.increased        : Factor w/ 5 levels "Strongly disagree",..: 5 3 1
 $ Increased.student.confidence                      : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
 $ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Increased.learning.of.the.students                : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Added.unique.learning.activities                  : Factor w/ 5 levels "Strongly disagree",..: 5 3 3

Upvotes: 2

Related Questions