Reputation: 17
I am a beginner in R and wanted to know if there is any way to convert multiple vectors/variables into a desired 'class' (e.g. 3 variables within a dataset are factors, and I want to convert these 3 into numerical variables in one go).
Below is the dataset which contains columns "Product"
as chr
and the remaining columns as factor
s, however I want to keep "Product"
and "Month"
as character and "Sales"
and "Profit"
as numeric.
str(Conditional_function_IVY)
'data.frame': 100 obs. of 4 variables:
$ Product: chr "Bellen" "Bellen" "Sunshine" "Sunset" ...
$ Month : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 4 4 ...
$ Sales : Factor w/ 88 levels " ? 501.00 "," ? 504.00 ",..: 8 13 64 16 55 78 81 29 2 52 ...
$ Profit : Factor w/ 65 levels " ? 100.00 "," ? 101.00 ",..: 44 34 5 15 39 16 37 38 65 56 ...
I've done it in the following way but it consumes a lot of time, hence I am wondering if there is any way which would let me do this in one go.
Conditional_function_IVY$Month=as.character(Conditional_function_IVY$Month)
> Conditional_function_IVY$Sales=as.numeric(Conditional_function_IVY$Sales)
> Conditional_function_IVY$Profit=as.numeric(Conditional_function_IVY$Profit)
> str(Conditional_function_IVY)
'data.frame': 100 obs. of 4 variables:
$ Product: chr "Bellen" "Bellen" "Sunshine" "Sunset" ...
$ Month : chr "January" "January" "January" "January" ...
$ Sales : num 8 13 64 16 55 78 81 29 2 52 ...
$ Profit : num 44 34 5 15 39 16 37 38 65 56 ...
Upvotes: 1
Views: 124
Reputation: 145775
I like Kevin's approach, except that I dislike the copy/paste/editing of as.numeric(gsub("[^0-9.]", "", as.character(...))
. If you had even 10 columns this would be tedious, if you had 100 columns it would be utterly impractical. I would define a little utility functon and do something like this:
# define helper function
sub_convert = function(x) as.numeric(gsub("[^0-9.]", "", as.character(...))
# using base R
to_convert = names(Conditional_function_IVY)[sapply(Conditional_function_IVY, is.factor)]
Conditional_function_IVY[to_convert] = lapply(
Conditional_function_IVY[to_convert],
sub_convert
)
# or using dplyr
library(dplyr)
Conditional_function_IVY = mutate_if(
Conditional_function_IVY,
is.factor,
sub_convert
)
This scales better and also has the advantage that if you need to tweak the sub_convert
function you only need to edit it in one place, instead of every time it is used.
Upvotes: 1
Reputation: 6264
The best way of fixing this is at the time of date frame creation/import, more modern approaches from the tidyverse such as readr
and tibble
deal well with guessing column types and don't convert automatically to factor.
If that is not an option for you then you can transform with dplyr::mutate
quite simply.
library(magrittr)
library(dplyr)
Conditional_function_IVY %<>%
mutate(
Month = as.character(Month),
Sales = as.numeric(as.character(Sales)),
Profit = as.numeric(as.character(Profit))
)
However, I notice you have some very strange values visible in your structure where your numeric values are stored. These could be stripped back to numeric using gsub
.
e.g. as.numeric(gsub("[^0-9.]", "", " ? 501.00 ")) # [1] 501
Using the two rows of your own data that I can derive from your question.
Conditional_function_IVY <- data.frame(
Product = rep("Bellen", 2),
Month = c("April", "August"),
Sales = c(" ? 501.00 ", " ? 504.00 "),
Profit = c(" ? 100.00 ", " ? 101.00 ")
)
Conditional_function_IVY %>%
mutate(
Month = as.character(Month),
Sales = as.numeric(gsub("[^0-9.]", "", as.character(Sales))),
Profit = as.numeric(gsub("[^0-9.]", "", as.character(Profit)))
)
# Product Month Sales Profit
# 1 Bellen April 501 100
# 2 Bellen August 504 101
Upvotes: 1