Zoe
Zoe

Reputation: 33

R - Trouble manipulating factors for likert package

I am working with likert data. I pulled four columns from my data frame with the following:

items <- df[, substr(names(df), 1, 11) == "RTAPOSTPrep"]

The result looks like this:

> items
   RTAPOSTPrep1_PDschool RTAPOSTPrep2_Pddistrict RTAPOSTPrep3_Pdregion RTAPOSTPrep4_PDnational
1    completely prepared     completely prepared   completely prepared     completely prepared
2    completely prepared           very prepared         very prepared           very prepared
3               prepared           very prepared   completely prepared     completely prepared
4          very prepared           very prepared         very prepared           very prepared
5                   <NA>                    <NA>                  <NA>                    <NA>
6    completely prepared     completely prepared   completely prepared     completely prepared
7    completely prepared     completely prepared   completely prepared     completely prepared
8    completely prepared     completely prepared         very prepared           very prepared
9    completely prepared     completely prepared   completely prepared     completely prepared
10         very prepared           very prepared         very prepared           very prepared
11   completely prepared     completely prepared         very prepared           very prepared
12   completely prepared     completely prepared   completely prepared     completely prepared
13   completely prepared           very prepared         very prepared           very prepared
14              prepared                prepared              prepared                prepared
15         very prepared           very prepared         very prepared           very prepared
16         very prepared           very prepared         very prepared           very prepared
17   completely prepared     completely prepared   completely prepared     completely prepared
18   completely prepared     completely prepared         very prepared           very prepared
19                  <NA>                    <NA>                  <NA>                    <NA>
20   completely prepared     completely prepared   completely prepared           very prepared
21         very prepared           very prepared         very prepared                prepared
22                  <NA>                    <NA>                  <NA>                    <NA>
23              prepared                prepared              prepared                prepared

The data looks like its stored as a factor:

> str(items)
'data.frame':   23 obs. of  4 variables:
 $ RTAPOSTPrep1_PDschool  : Factor w/ 3 levels "completely prepared",..: 1 1 2 3 NA 1 1 1 1 3 ...
 $ RTAPOSTPrep2_Pddistrict: Factor w/ 3 levels "completely prepared",..: 1 3 3 3 NA 1 1 1 1 3 ...
 $ RTAPOSTPrep3_Pdregion  : Factor w/ 3 levels "completely prepared",..: 1 3 1 3 NA 1 1 3 1 3 ...
 $ RTAPOSTPrep4_PDnational: Factor w/ 3 levels "completely prepared",..: 1 3 1 3 NA 1 1 3 1 3 ...

I'd like to use the package "likert" to analyze this data, but when I do the levels are out of order:

>likert(items)
                     Item completely prepared prepared very prepared
1   RTAPOSTPrep1_PDschool                  60       15            25
2 RTAPOSTPrep2_Pddistrict                  50       10            40
3   RTAPOSTPrep3_Pdregion                  40       10            50
4 RTAPOSTPrep4_PDnational                  35       15            50

I would like there to be five levels in the following order: not at all prepared, a little prepared, prepared, very prepared, completely prepared. But when I try to manipulate the levels on "items" in any way, I get an error saying that the command is only for factors. If I use $ to pull out the column (i.e. items$RTAPOSTPrep1_PDschool), I can manipulate the levels of the factor, but I usually have to do this for dozens of columns, and would like a way to quickly relevel all the columns so that they all have the same five levels in the same order. My best attempt at this was:

> apply(items,2,function(x) relevel(x, ref="prepared"))
Error in relevel.default(x, ref = "prepared") : 
  'relevel' only for factors

I suspect I just have a bad understanding of how factors work, and how extracting data from data frames works (I'm pretty new to R). Could somebody please help? I have spent an inordinate amount of time trying to do this.

Upvotes: 3

Views: 317

Answers (2)

Thomas K
Thomas K

Reputation: 3311

Extracting data

I personally prefer dplyr over base R:

library(dplyr)
df %>% 
  select(contains("RTAPOSTPrep")) # selects all the columns which contain "RTAPOSTPrep"

Releveling factors

Cookbook for R gives a good introduction.

You could use:

# sample data
var1 <- factor(c("not at all prepared", "prepared"))
var2 <- factor(c("prepared", "very prepared"))
df <- data.frame(var1, var2)
lapply(df, levels)
# $var1
# [1] "not at all prepared" "prepared"           

# $var2
# [1] "prepared"      "very prepared"


# create vector with correct order
levels <- c("not at all prepared", "a little prepared", "prepared",
            "very prepared", "completely prepared")

new_df <- lapply(df, function(x) factor(x, levels = levels)) %>% 
  as_data_frame 

lapply(new_df, levels)
# $var1
# [1] "not at all prepared" "a little prepared"   "prepared"            "very prepared"       "completely prepared"

# $var2
# [1] "not at all prepared" "a little prepared"   "prepared"            "very prepared"       "completely prepared"

Update: If you don't want a new data.frame but want to modify it in place, pcantalupos approach works great:

df[] <- lapply(df, function(x) factor(x, levels = levels))

Upvotes: 1

pcantalupo
pcantalupo

Reputation: 2226

First, create a vector to hold the levels in the order that you want:

lvl = c("not at all prepared", "a little prepared", "prepared", "very prepared", "completely prepared")

Below, I create an example data frame and show that the levels are out of order:

d <- data.frame(a=sample(lvl,15, replace=T), b=sample(lvl,15, replace=T))
levels(d$a)

[1] "a little prepared"   "completely prepared" "not at all prepared" "very prepared"   

Then, use lapply to refactor each column using your specified levels and assign back to original data.frame

d[] <- lapply(d, function(x) x = factor(x, levels=lvl))
levels(d$a)

[1] "not at all prepared" "a little prepared"   "prepared"            "very prepared"      
[5] "completely prepared"

Upvotes: 1

Related Questions