how to convert a factor variable into a numeric - using R

Question

I have another problem and hope for your help. I googled already, asked a friend and tried to understand similar problems/questions around this website, but I still can't figure it out...

Ok so here's my problem: I have a large data set that covers data from 1980-2012. I used the read.spss function to get the data into R

rohdaten <-read.spss("C:\Users\xxxxxxx.sav", use.value.labels = TRUE, to.data.frame = TRUE,
        max.value.labels = Inf, trim.factor.names = FALSE,  
        trim_values = TRUE, reencode = NA, use.missings = TRUE)

That seems to work. Then I'd like to analyze variable 14 (v14) which is a likert-scale going from "totally agree" to "don't agree at all" and is therefore coded as a factor. I'd like to compare the change of the replies to this likert-scale over time and so I want to calculate the mean of that and in order to do so, it needs to be numeric. That's the first step of the issue... According to R for Dummies I need to change the factor into a character first and then change it into a numeric. Alright... here's my code... First of all I tried the recode()function which didn't work - then I just went on and created a new object "econ" that countains the variable14 sort of in copy. (so I don't affect the original v14 data in the workspace)

rohdaten$v14_2 <- recode(rohdaten$v14, "8 = NA; 9 = NA; 0 = NA; 1 = 1; 2 = 2; 3 = 3;  4 = 4; 5 = 5; as.factor.result = FALSE")  #should recode already - kinda doesn't work
class(rohdaten$v14_2) #just tells me it's a factor...
str(rohdaten$v14_2)
econ <- rohdaten$v14_2

With the "for Dummies-Website" in mind I change the stuff into characters and then into numeric

str(econ)
as.character(econ)
head(econ)
econ <- as.numeric(econ)
head(econ)

This for some reason gives me a "good" result, despite the "error" (??) in the "as character" line... If I go with econ <- as.character(econ) - I get "Warning message: NAs introduced by coercion" after the econ <- as.numeric(econ) command...

Ok so far it seems to work somehow I guess!?

But then I want to calculate the mean for every year (which is in variable 2) and I stumbled upon the function by() which looked like it's doing exactly what I want so my code turned out to be:

avgEconRat <- by(data = rohdaten, INDICES = rohdaten$v2, FUN = mean, na.rm = T)
head(avgEconRat) #actually gives me some means - not sure though whether it's the real means or the means of the "factor-number" that's mentioned in the "for-dummies-website" - sorry I can't explain it better :-(

Now I seem to have the data in the avgEconRat Object, but first of all, I'm not sure if my mean is correct at all, and secondly, and that's somehow the main issue, how do I refer to my data now to plot it?

p1 <- ggplot(na.action=na.exclude, rohdaten, aes(v14, v2))
p1 + geom_point(aes(color = v652), alpha = 0.6) +
      facet_grid(. ~ v5)

That's the code I had in mind - and I know I'd have to replace "rohdaten" with "econ" now, but since I have no idea how "econ" is structured (and also don't really know how to find out), I'm absolutely stuck here :-/ I feel like I have (or might have, depending whether my means are the right ones...) the data I need but kinda lost access to it.

Sorry for my weird problems, but learning programming without real mentoring is kinda tough without any previous experience.

Thank you very much for your patience, time and help!

Jthorpe · Accepted Answer

First, here's why you would have to convert to character before converting to numeric:

Lets say we have a factor that contains a handful of numbers

x = factor(c(1,2,7,7))

you can inspect how this is represented in R like so:

unclass(x)
#> [1] 1 2 3 3
#> attr(,"levels")
#> [1] "1" "2" "7"

and you would see that there are 3 levels, and that the values are represented as indexes to those 3 levels. Furthermore if you call as.numeric() directly, you get the index vector and not the values you were hoping for:

as.numeric(x)
#> [1] 1 2 3 3

On the other hand, if you have a likert scale, and the factor levels are in the correct order:

f = factor(c("agree","agree","somewhat agree","somewhat agree","somewhat disagree","disagree","disagree"))

levels(f)
#> [1] "agree" "disagree" "somewhat agree" "somewhat disagree"

you may actually want the index:

#> as.numeric(f)
[1] 1 1 3 3 4 2 2

If, however, your levels are out of order, as in:

f = factor(sample(c("agree","somewhat agree","somewhat disagree","disagree"),
                  20,
                  TRUE))
levels(f)
#> [1] "agree" "disagree" "somewhat agree" "somewhat disagree"

then instead of calling as.numeric(as.character(f)) (which makes no sense in this case), you'll want to re-order the factor levels, and then call as.numeric, like so:

as.numeric(factor(f,
                  # specifify the levels in the correct order:
                  levels=c("agree","somewhat agree","somewhat disagree","disagree"))

how to convert a factor variable into a numeric - using R

Answers (2)

Related Questions