Reputation: 33
I am working with likert data. I pulled four columns from my data frame with the following:
items <- df[, substr(names(df), 1, 11) == "RTAPOSTPrep"]
The result looks like this:
> items
RTAPOSTPrep1_PDschool RTAPOSTPrep2_Pddistrict RTAPOSTPrep3_Pdregion RTAPOSTPrep4_PDnational
1 completely prepared completely prepared completely prepared completely prepared
2 completely prepared very prepared very prepared very prepared
3 prepared very prepared completely prepared completely prepared
4 very prepared very prepared very prepared very prepared
5 <NA> <NA> <NA> <NA>
6 completely prepared completely prepared completely prepared completely prepared
7 completely prepared completely prepared completely prepared completely prepared
8 completely prepared completely prepared very prepared very prepared
9 completely prepared completely prepared completely prepared completely prepared
10 very prepared very prepared very prepared very prepared
11 completely prepared completely prepared very prepared very prepared
12 completely prepared completely prepared completely prepared completely prepared
13 completely prepared very prepared very prepared very prepared
14 prepared prepared prepared prepared
15 very prepared very prepared very prepared very prepared
16 very prepared very prepared very prepared very prepared
17 completely prepared completely prepared completely prepared completely prepared
18 completely prepared completely prepared very prepared very prepared
19 <NA> <NA> <NA> <NA>
20 completely prepared completely prepared completely prepared very prepared
21 very prepared very prepared very prepared prepared
22 <NA> <NA> <NA> <NA>
23 prepared prepared prepared prepared
The data looks like its stored as a factor:
> str(items)
'data.frame': 23 obs. of 4 variables:
$ RTAPOSTPrep1_PDschool : Factor w/ 3 levels "completely prepared",..: 1 1 2 3 NA 1 1 1 1 3 ...
$ RTAPOSTPrep2_Pddistrict: Factor w/ 3 levels "completely prepared",..: 1 3 3 3 NA 1 1 1 1 3 ...
$ RTAPOSTPrep3_Pdregion : Factor w/ 3 levels "completely prepared",..: 1 3 1 3 NA 1 1 3 1 3 ...
$ RTAPOSTPrep4_PDnational: Factor w/ 3 levels "completely prepared",..: 1 3 1 3 NA 1 1 3 1 3 ...
I'd like to use the package "likert" to analyze this data, but when I do the levels are out of order:
>likert(items)
Item completely prepared prepared very prepared
1 RTAPOSTPrep1_PDschool 60 15 25
2 RTAPOSTPrep2_Pddistrict 50 10 40
3 RTAPOSTPrep3_Pdregion 40 10 50
4 RTAPOSTPrep4_PDnational 35 15 50
I would like there to be five levels in the following order: not at all prepared, a little prepared, prepared, very prepared, completely prepared. But when I try to manipulate the levels on "items" in any way, I get an error saying that the command is only for factors. If I use $ to pull out the column (i.e. items$RTAPOSTPrep1_PDschool), I can manipulate the levels of the factor, but I usually have to do this for dozens of columns, and would like a way to quickly relevel all the columns so that they all have the same five levels in the same order. My best attempt at this was:
> apply(items,2,function(x) relevel(x, ref="prepared"))
Error in relevel.default(x, ref = "prepared") :
'relevel' only for factors
I suspect I just have a bad understanding of how factors work, and how extracting data from data frames works (I'm pretty new to R). Could somebody please help? I have spent an inordinate amount of time trying to do this.
Upvotes: 3
Views: 317
Reputation: 3311
I personally prefer dplyr
over base R:
library(dplyr)
df %>%
select(contains("RTAPOSTPrep")) # selects all the columns which contain "RTAPOSTPrep"
Cookbook for R gives a good introduction.
You could use:
# sample data
var1 <- factor(c("not at all prepared", "prepared"))
var2 <- factor(c("prepared", "very prepared"))
df <- data.frame(var1, var2)
lapply(df, levels)
# $var1
# [1] "not at all prepared" "prepared"
# $var2
# [1] "prepared" "very prepared"
# create vector with correct order
levels <- c("not at all prepared", "a little prepared", "prepared",
"very prepared", "completely prepared")
new_df <- lapply(df, function(x) factor(x, levels = levels)) %>%
as_data_frame
lapply(new_df, levels)
# $var1
# [1] "not at all prepared" "a little prepared" "prepared" "very prepared" "completely prepared"
# $var2
# [1] "not at all prepared" "a little prepared" "prepared" "very prepared" "completely prepared"
Update:
If you don't want a new data.frame
but want to modify it in place, pcantalupos approach works great:
df[] <- lapply(df, function(x) factor(x, levels = levels))
Upvotes: 1
Reputation: 2226
First, create a vector to hold the levels in the order that you want:
lvl = c("not at all prepared", "a little prepared", "prepared", "very prepared", "completely prepared")
Below, I create an example data frame and show that the levels are out of order:
d <- data.frame(a=sample(lvl,15, replace=T), b=sample(lvl,15, replace=T))
levels(d$a)
[1] "a little prepared" "completely prepared" "not at all prepared" "very prepared"
Then, use lapply
to refactor each column using your specified levels and assign back to original data.frame
d[] <- lapply(d, function(x) x = factor(x, levels=lvl))
levels(d$a)
[1] "not at all prepared" "a little prepared" "prepared" "very prepared"
[5] "completely prepared"
Upvotes: 1