Reputation: 1769
I have the following vector:
vec<-c("70,00 mln €", "20,50 mln €", "400 mila", "400 mila", "400 mila", "100 mila", "50 mila")
In vec
, mln
means "milions" whereas mila
means "thousand". I would like to convert this vector in a numeric vector like the following
70000000, 20500000, 400000, 400000, 400000, 100000, 50000
e.g. 70000000 stands for 70,00 mln, 20500000 stands for 20,50 mln and so on.
I tried with the following:
unlist(regmatches(vec, gregexpr("[[:digit:]]+", vec)))
to take the numeric part of the strings and then multiply by 1000 or 1000000, but I obtained:
[1] "70" "00" "20" "50" "400" "400" "400" "100" "50"
Here, "70" "00"
should be just "70"
, "20" "50"
should be instead 20.5
(numeric).
EDIT The one above is just an example. The true (longer) vector is the following
vec <- c("70,00 mln €", "20,50 mln €", "7,00 mln €", "1,90 mln €",
"1,50 mln €", "16,00 mln €", "15,00 mln €", "3,00 mln €",
"10 mln €", "6,70 mln €", "5,25 mln €", "4,80 mln €",
"3,68 mln €", "1,19 mln €", "1,00 mln €", "21 mln €",
"20 mln €", "3 mln €", "2 mln €", "1,95 mln €", "14.5 mln",
"14.5 mln", "12 mln", "7 mln", "2,32 mln", "21,30 mln", "21 mln",
"20 mln", "5 mln", "3,5 mln", "2 mln", "2 mln", "1,00 mln €",
"19,92 mln", "12,70 mln", "8,00 mln", "1 mln", "4,50 mln", "1,95 mln",
"4,50 mln", "1,95 mln", "1,00 mln €", "10,00 mln €", "2,00 mln €",
"2 mln", "4,50 mln", "8,00 mln €", "4,90 mln €", "1,00 mln €",
"400 mila", "400 mila", "400 mila", "100 mila", "50 mila", "600 mila €",
"500 mila €", "500 mila €", "200 mila €", "600 mila",
"520 mila", "200 mila", "100 mila", "500 mila €", "300 mila €",
"200 mila €", "150 mila €", "20 mila €", "700 mila €",
"500 mila", "500 mila", "600 mila €", "450 mila €", "33 mila €",
"500 mila €", "700 mila €", "250 mila €", "100 mila €"
)
Upvotes: 1
Views: 53
Reputation: 887651
An easier option is to do the replacement with e6
and e3
for mln
and mila
after removing the space and other characters and then convert to numeric with as.numeric
library(stringr)
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec),
"\\s+€|\\s+"), c(mln = "e6", "mila" = "e3")))
-output
[1] 70000000 20500000 400000 400000 400000 100000 50000
Or using the updated vec
in OP's post
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec),
"\\s+€|\\s+"), c(mln = "e6", "mila" = "e3")))
-output
[1] 70000000 20500000 7000000 1900000 1500000 16000000 15000000 3000000 10000000 6700000 5250000 4800000 3680000 1190000
[15] 1000000 21000000 20000000 3000000 2000000 1950000 14500000 14500000 12000000 7000000 2320000 21300000 21000000 20000000
[29] 5000000 3500000 2000000 2000000 1000000 19920000 12700000 8000000 1000000 4500000 1950000 4500000 1950000 1000000
[43] 10000000 2000000 2000000 4500000 8000000 4900000 1000000 400000 400000 400000 100000 50000 600000 500000
[57] 500000 200000 600000 520000 200000 100000 500000 300000 200000 150000 20000 700000 500000 500000
[71] 600000 450000 33000 500000 700000 250000 100000
Upvotes: 4