Reputation: 5497
I have a file with internationally formatted numbers (i.e strings) including units of measurement. In this case the decimal place is indicated by "," and the 1e3 seperator is indicated as "." (i.e. German number formats).
a <- c('2.200.222 €',
' 180.109,3 €')
or
b <- c('28,42 m²',
'47,70 m²')
I'd like to convert these strings efficiently to numeric
. I've tried to filter out numbers by codes like
require(stringr)
str_extract(a, pattern='[0-9]+.[0-9]+.[0-9]+')
str_extract(b, pattern='[0-9]+,[0-9]+')
however, this does seem to be too prone to errors and I guess there must be a more standardized way. So here's my question: Is there a custom function, package or something else that is capable of such a problem?
Thank you very much!
Upvotes: 3
Views: 1313
Reputation: 179418
Here is a function that uses gsub
to deal with the sample data you posted:
x <- c('2.200.222 €', ' 180.109,3 €', '28,42 m²', '47,70 m²')
strip <- function(x){
z <- gsub("[^0-9,.]", "", x)
z <- gsub("\\.", "", z)
gsub(",", ".", z)
}
as.numeric(strip(x))
[1] 2200222.00 180109.30 28.42 47.70
It works like this:
Upvotes: 8