Jason Deutsch
Jason Deutsch

Reputation: 61

I'm trying to change a data frame variable to a factor variable but having issues

> summary(CA_extract[2])
 REPORTING_YEAR
 Min.   :1990  
 1st Qu.:1995  
 Median :2010  
 Mean   :2007  
 3rd Qu.:2017  
 Max.   :2019  
> table(CA_extract[2])

1990 1995 2000 2005 2010 2015 2016 2017 2018 2019 
9081 5335 5787 5685 4888 4644 4590 4606 4581 4517 
> nrow(CA_extract)
[1] 53714
> ncol(CA_extract)
[1] 20
> class(CA_extract[2])
[1] "data.frame"
> summarise(CA_extract[2])
data frame with 0 columns and 1 row
> as.factor(CA_extract[2])
REPORTING_YEAR 
          <NA> 
Levels: c(1990, 1995, 2000, 2005, 2010, 2015, 2016, 2017, 2018, 2019)

> is.numeric(CA_extract[2])
[1] FALSE
> is.character(CA_extract[2])
[1] FALSE
> is.list(CA_extract[2])
[1] TRUE
> is.double(CA_extract[2])
[1] FALSE
> is.factor(CA_extract[2])
[1] FALSE
> is.vector(CA_extract[2])
[1] FALSE

I've been trying to figure out how to change it to a factor, from what I can tell the data should allow it to work but every time I run it I get a funky column with a stack of N/As. Any help would be great, I was able to get it to work in an isolated case but I lost the solution and I wasn't able to integrate it into a for loop.

let me know if you need more info. Dunno how to provide reproducible data without you downloading the same dataset I did. (it's publically available)

Upvotes: 1

Views: 372

Answers (2)

Brendan A.
Brendan A.

Reputation: 1268

Short answer: as.factor(CA_extract[[2]])

The problem has to do with how you're referencing the column in your data frame by using only single brackets. See this answer (and the relevant section in the R documentation) for a nice explanation of the differences in indexing methods.

Using single brackets to index your data frame returns another data frame, as you saw with your test class(CA_extract[2]). Compare the outputs of str(CA_extract[2]) with str(CA_extract[[2]]) and the difference should be clear.

Upvotes: 1

Luca
Luca

Reputation: 36

you missed a comma:

CA_extract[, 2] <- factor(CA_extract[, 2])

Also

CA_extract$varname <- factor(CA_extract$varname)

Upvotes: 0

Related Questions