Reputation: 61
> summary(CA_extract[2])
REPORTING_YEAR
Min. :1990
1st Qu.:1995
Median :2010
Mean :2007
3rd Qu.:2017
Max. :2019
> table(CA_extract[2])
1990 1995 2000 2005 2010 2015 2016 2017 2018 2019
9081 5335 5787 5685 4888 4644 4590 4606 4581 4517
> nrow(CA_extract)
[1] 53714
> ncol(CA_extract)
[1] 20
> class(CA_extract[2])
[1] "data.frame"
> summarise(CA_extract[2])
data frame with 0 columns and 1 row
> as.factor(CA_extract[2])
REPORTING_YEAR
<NA>
Levels: c(1990, 1995, 2000, 2005, 2010, 2015, 2016, 2017, 2018, 2019)
> is.numeric(CA_extract[2])
[1] FALSE
> is.character(CA_extract[2])
[1] FALSE
> is.list(CA_extract[2])
[1] TRUE
> is.double(CA_extract[2])
[1] FALSE
> is.factor(CA_extract[2])
[1] FALSE
> is.vector(CA_extract[2])
[1] FALSE
I've been trying to figure out how to change it to a factor, from what I can tell the data should allow it to work but every time I run it I get a funky column with a stack of N/As. Any help would be great, I was able to get it to work in an isolated case but I lost the solution and I wasn't able to integrate it into a for loop.
let me know if you need more info. Dunno how to provide reproducible data without you downloading the same dataset I did. (it's publically available)
Upvotes: 1
Views: 372
Reputation: 1268
Short answer: as.factor(CA_extract[[2]])
The problem has to do with how you're referencing the column in your data frame by using only single brackets. See this answer (and the relevant section in the R documentation) for a nice explanation of the differences in indexing methods.
Using single brackets to index your data frame returns another data frame, as you saw with your test class(CA_extract[2])
. Compare the outputs of str(CA_extract[2])
with str(CA_extract[[2]])
and the difference should be clear.
Upvotes: 1
Reputation: 36
you missed a comma:
CA_extract[, 2] <- factor(CA_extract[, 2])
Also
CA_extract$varname <- factor(CA_extract$varname)
Upvotes: 0