Mark
Mark

Reputation: 1769

Take the number contained in a string (also floating point number)

I have the following vector:

vec<-c("70,00 mln  €", "20,50 mln €", "400 mila", "400 mila", "400 mila", "100 mila", "50 mila")

In vec, mln means "milions" whereas mila means "thousand". I would like to convert this vector in a numeric vector like the following

70000000, 20500000, 400000, 400000, 400000, 100000, 50000

e.g. 70000000 stands for 70,00 mln, 20500000 stands for 20,50 mln and so on.

I tried with the following:

unlist(regmatches(vec, gregexpr("[[:digit:]]+", vec)))

to take the numeric part of the strings and then multiply by 1000 or 1000000, but I obtained:

[1] "70"  "00"  "20"  "50"  "400" "400" "400" "100" "50" 

Here, "70" "00" should be just "70", "20" "50" should be instead 20.5 (numeric).

EDIT The one above is just an example. The true (longer) vector is the following

vec <- c("70,00 mln  €", "20,50 mln €", "7,00 mln €", "1,90 mln €", 
"1,50 mln €", "16,00 mln €", "15,00 mln €", "3,00 mln €", 
"10 mln €", "6,70 mln €", "5,25 mln €", "4,80 mln €", 
"3,68 mln €", "1,19 mln €", "1,00 mln €", "21 mln €", 
"20 mln €", "3 mln €", "2 mln €", "1,95 mln €", "14.5 mln", 
"14.5 mln", "12 mln", "7 mln", "2,32 mln", "21,30 mln", "21 mln", 
"20 mln", "5 mln", "3,5 mln", "2 mln", "2 mln", "1,00 mln €", 
"19,92 mln", "12,70 mln", "8,00 mln", "1 mln", "4,50 mln", "1,95 mln", 
"4,50 mln", "1,95 mln", "1,00 mln €", "10,00 mln €", "2,00 mln €", 
"2 mln", "4,50 mln", "8,00 mln €", "4,90 mln €", "1,00 mln €", 
"400 mila", "400 mila", "400 mila", "100 mila", "50 mila", "600 mila €", 
"500 mila €", "500 mila €", "200 mila €", "600 mila", 
"520 mila", "200 mila", "100 mila", "500 mila €", "300 mila €", 
"200 mila €", "150 mila €", "20 mila €", "700 mila €", 
"500 mila", "500 mila", "600 mila €", "450 mila €", "33 mila €", 
"500 mila €", "700 mila €", "250 mila €", "100 mila €"
)

Upvotes: 1

Views: 53

Answers (1)

akrun
akrun

Reputation: 887651

An easier option is to do the replacement with e6 and e3 for mln and mila after removing the space and other characters and then convert to numeric with as.numeric

library(stringr)
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec), 
        "\\s+€|\\s+"), c(mln = "e6", "mila" = "e3")))

-output

[1] 70000000 20500000   400000   400000   400000   100000    50000

Or using the updated vec in OP's post

as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec), 
         "\\s+€|\\s+"), c(mln = "e6", "mila" = "e3")))

-output

[1] 70000000 20500000  7000000  1900000  1500000 16000000 15000000  3000000 10000000  6700000  5250000  4800000  3680000  1190000
[15]  1000000 21000000 20000000  3000000  2000000  1950000 14500000 14500000 12000000  7000000  2320000 21300000 21000000 20000000
[29]  5000000  3500000  2000000  2000000  1000000 19920000 12700000  8000000  1000000  4500000  1950000  4500000  1950000  1000000
[43] 10000000  2000000  2000000  4500000  8000000  4900000  1000000   400000   400000   400000   100000    50000   600000   500000
[57]   500000   200000   600000   520000   200000   100000   500000   300000   200000   150000    20000   700000   500000   500000
[71]   600000   450000    33000   500000   700000   250000   100000

Upvotes: 4

Related Questions