Misha
Misha

Reputation: 173

regex for large currency amounts

I'm trying to write an expression that extracts numbers from a string with corresponding currency signs and potential amount abbreviations (m or k):

text <- "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m"
str_extract(text, "\\$(\\d+)[a-z]+") # solution_1
str_extract(text, "\\$(\\d+)+") #solution_2

Desired output:

"$10000 $10,000 $5m $50m $50.2m $50,2m"

The problem is that solution_1 extracts only "$5m" and solution_2 only "$10000".

UPDATE: @Tim Biegeleisen provided a great solution. I am also trying to get rid of a period in the end, e.g. $50m. and... to get $50m.

text <- "$5, $10,000, and $5m, and $50m. and $50.2m and $50,2m"
m <- gregexpr("\\$[0-9.,]+?[mbt]?(?=(?:, | |$))", text, perl=TRUE)
regmatches(text, m)

Upvotes: 0

Views: 308

Answers (3)

akrun
akrun

Reputation: 887391

May be we could use gsub as the OP's expected output showed as a single string

gsub("\\b[A-Za-z]+,?|[,.](\\s)", "\\1", text)
#[1] "$10000  $10,000  $5m  $50m  $50.2m  $50,2m"
#[2] "$5 $10,000  $5m  $50m  $50.2m  $50,2m"     

data

text <- c( "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m",
      "$5, $10,000, and $5m, and $50m. and $50.2m and $50,2m")

Upvotes: 0

erocoar
erocoar

Reputation: 5893

Could also do it e.g. this way

txt = unlist(strsplit(text, split = " "))
txt[grep("\\$\\d+((,|\\.)?)(\\d*)?(m)?", txt)]

[1] "$10000"  "$10,000" "$5m"     "$50m"    "$50.2m"  "$50,2m" 

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521804

Try using grepexpr with regmatches:

text <- "$10000 and $10,000 and $5m and $50m and $50.2m and $50,2m"
m <- gregexpr("\\$[0-9.,]+[mbt]?", text)

regmatches(text, m)
[[1]]
[1] "$10000"  "$10,000" "$5m"     "$50m"    "$50.2m"  "$50,2m"

Demo

I am assuming that only numbers, comma, and decimal point, would compose a given amount string. I also assume that the amount might end in m, b, or t (for million, billion, trillion).

Upvotes: 3

Related Questions