user817995
user817995

Reputation: 17

Converting vector to numeric and character in one go

I am a beginner in R and wanted to know if there is any way to convert multiple vectors/variables into a desired 'class' (e.g. 3 variables within a dataset are factors, and I want to convert these 3 into numerical variables in one go).

Below is the dataset which contains columns "Product" as chr and the remaining columns as factors, however I want to keep "Product" and "Month" as character and "Sales" and "Profit" as numeric.

str(Conditional_function_IVY)

'data.frame':   100 obs. of  4 variables:
 $ Product: chr  "Bellen" "Bellen" "Sunshine" "Sunset" ...
 $ Month  : Factor w/ 12 levels "April","August",..: 5 5 5 5 5 5 5 5 4 4 ...
 $ Sales  : Factor w/ 88 levels " ? 501.00 "," ? 504.00 ",..: 8 13 64 16 55 78 81 29 2 52 ...
 $ Profit : Factor w/ 65 levels " ? 100.00 "," ? 101.00 ",..: 44 34 5 15 39 16 37 38 65 56 ...

I've done it in the following way but it consumes a lot of time, hence I am wondering if there is any way which would let me do this in one go.

Conditional_function_IVY$Month=as.character(Conditional_function_IVY$Month)
> Conditional_function_IVY$Sales=as.numeric(Conditional_function_IVY$Sales)
> Conditional_function_IVY$Profit=as.numeric(Conditional_function_IVY$Profit)
> str(Conditional_function_IVY)
'data.frame':   100 obs. of  4 variables:
 $ Product: chr  "Bellen" "Bellen" "Sunshine" "Sunset" ...
 $ Month  : chr  "January" "January" "January" "January" ...
 $ Sales  : num  8 13 64 16 55 78 81 29 2 52 ...
 $ Profit : num  44 34 5 15 39 16 37 38 65 56 ...

Upvotes: 1

Views: 124

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145775

I like Kevin's approach, except that I dislike the copy/paste/editing of as.numeric(gsub("[^0-9.]", "", as.character(...)). If you had even 10 columns this would be tedious, if you had 100 columns it would be utterly impractical. I would define a little utility functon and do something like this:

# define helper function
sub_convert = function(x) as.numeric(gsub("[^0-9.]", "", as.character(...))

# using base R
to_convert = names(Conditional_function_IVY)[sapply(Conditional_function_IVY, is.factor)]
Conditional_function_IVY[to_convert] = lapply(
    Conditional_function_IVY[to_convert],
    sub_convert
)

# or using dplyr
library(dplyr)
Conditional_function_IVY = mutate_if(
    Conditional_function_IVY,
    is.factor,
    sub_convert
)

This scales better and also has the advantage that if you need to tweak the sub_convert function you only need to edit it in one place, instead of every time it is used.

Upvotes: 1

Kevin Arseneau
Kevin Arseneau

Reputation: 6264

The best way of fixing this is at the time of date frame creation/import, more modern approaches from the tidyverse such as readr and tibble deal well with guessing column types and don't convert automatically to factor.

If that is not an option for you then you can transform with dplyr::mutate quite simply.

library(magrittr)
library(dplyr)

Conditional_function_IVY %<>%
  mutate(
    Month = as.character(Month),
    Sales = as.numeric(as.character(Sales)),
    Profit = as.numeric(as.character(Profit))
  )

However, I notice you have some very strange values visible in your structure where your numeric values are stored. These could be stripped back to numeric using gsub.

e.g. as.numeric(gsub("[^0-9.]", "", " ? 501.00 ")) # [1] 501

With two rows of your data

Using the two rows of your own data that I can derive from your question.

Conditional_function_IVY <- data.frame(
  Product = rep("Bellen", 2),
  Month = c("April", "August"),
  Sales = c(" ? 501.00 ", " ? 504.00 "),
  Profit = c(" ? 100.00 ", " ? 101.00 ")
)

Conditional_function_IVY %>%
  mutate(
    Month = as.character(Month),
    Sales = as.numeric(gsub("[^0-9.]", "", as.character(Sales))),
    Profit = as.numeric(gsub("[^0-9.]", "", as.character(Profit)))
  )

#   Product  Month Sales Profit
# 1  Bellen  April   501    100
# 2  Bellen August   504    101 

Upvotes: 1

Related Questions