Reputation: 2763
When I apply as.numeric, and also as.integer to a column, it changes the values. Why is this? e.g:
test <- data.frame(structure(c("52053,34", "79032,83", "20679,06", "20799,56", "20679,06",
"21279,45", "51789,44", "54189,45", "73138,89", "73138,89"), .Dim = c(10L,
1L)))
names(test)[names(test) == "structure.c..52053.34....79032.83....20679.06....20799.56....20679.06..."] <- "column"
test$b <- as.numeric(test$column)
test$c <- as.integer(test$column)
Upvotes: 0
Views: 2295
Reputation: 269441
test$column
is a factor.
class(test$column)
## [1] "factor"
levels(test$column)
shows the labels of the levels of a factor.
levels(test$column)
## [1] "20679,06" "20799,56" "21279,45" "51789,44" "52053,34" "54189,45" "73138,89"
## [8] "79032,83"
The actual data values are integers: 5, 8, 1, etc.
unclass(test$column)
## [1] 5 8 1 2 1 3 4 6 7 7
## attr(,"levels")
## [1] "20679,06" "20799,56" "21279,45" "51789,44" "52053,34" "54189,45" "73138,89"
## [8] "79032,83"
The first element of test$column
is represented by the integer 5 because it is the 5th level. Looking at the levels vector we see that the label of the 5th level is
levels(test$column)[5]
## [1] "52053,34"
In general, we want to get the labels of each corresponding element and convert each of those to numeric:
as.numeric(sub(",", ".", levels(test$column))[test$column])
## [1] 52053.34 79032.83 20679.06 20799.56 20679.06 21279.45 51789.44 54189.45
## [9] 73138.89 73138.89
Alternately try this shorter version:
as.numeric(sub(",", ".", test$column))
## [1] 52053.34 79032.83 20679.06 20799.56 20679.06 21279.45 51789.44 54189.45
## [9] 73138.89 73138.89
If the numbers were represented using decimal points in the first place (as opposed to commas) then this would have been sufficient where x is such a factor:
as.numeric(as.character(x))
Upvotes: 1