PCdL
PCdL

Reputation: 99

Dataframe column: Remove quotes, change decimals and turn into numeric

I have a character dataframe column that needs to be adjusted to a numeric.

I have tried substr, and then conversion to numeric, but the substr command does not recognize the quotes. I have also tried gsub, but it does recognize the length of the character and leaves the quotes intact.

The character column contains decimal comma's and quotes like this:

test <- as.data.frame(c("\"2,60\"", "\"1,30\"", "\"850,00\"", "\"1000,00\"", "\"57,25\"",  "\"98,00\""))
colnames(test) <- "v1" 

I want to change the column to numeric with decimal points like this:

test.num <- as.data.frame(c(2.60, 1.30, 850.00, 1000.00, 57.25, 98.00))
colnames(test.num) <- "v1"

I hope someone can offer some guidance.

Upvotes: 3

Views: 227

Answers (4)

Onyambu
Onyambu

Reputation: 79208

Read the data back in using , as the decimal operator:

read.table(text = as.character(test$v1), dec=',')

       V1
1    2.60
2    1.30
3  850.00
4 1000.00
5   57.25
6   98.00

Upvotes: 3

jay.sf
jay.sf

Reputation: 72758

Using gsub from base R you could replace all [[:punct:]]uation with nothing and divide the result by 100 (always two digits after the , assumed).

as.numeric(gsub("\\D", "", test$v1)) / 100
# [1]    2.60    1.30  850.00 1000.00   57.25   98.00

If the assumption does not hold, you could replace the "," with a ".".

as.numeric(gsub(",", ".", gsub("\"", "", test2$v1)))
# [1]    2.60    1.30  850.00 1000.00   57.25   98.00

Data

test <- structure(list(v1 = structure(c(3L, 1L, 5L, 2L, 4L, 6L), .Label = c("\"1,30\"", 
"\"1000,00\"", "\"2,60\"", "\"57,25\"", "\"850,00\"", "\"98,00\""
), class = "factor")), row.names = c(NA, -6L), class = "data.frame")

test2 <- structure(list(v1 = structure(c(3L, 1L, 5L, 2L, 4L, 6L), .Label = c("\"1,3\"", 
"\"1000,0\"", "\"2,60\"", "\"57,25\"", "\"850,00\"", "\"98,00\""
), class = "factor")), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 3

s_baldur
s_baldur

Reputation: 33488

Using chartr():

as.numeric(chartr('",', ' .', test$v1))
# [1]    2.60    1.30  850.00 1000.00   57.25   98.00

Upvotes: 5

joran
joran

Reputation: 173557

Here's one option using a function from readr

readr::parse_number(as.character(test$v1),locale = locale(decimal_mark = ","))

Upvotes: 4

Related Questions