Nev1111
Nev1111

Reputation: 1049

Extracting US dollar amount

This question has been asked before but I am still not able to make this work entirely. I have the following examples of strings:

"Transfer to Retirement Rsvs-MA FX                   .11"                
"Opening Balance                FX        342,536,002.63"     
"VA                 85.85"               
"VB                   .00"     
"Manual Adjustment              FX              6,838.36-"

I would like to extract the US dollar/cents amount from the strings into a separate column of a dataframe. I have the following regex expression:

rx = (r"(\$?(?:\d+,)*\d+\.\d+\-?)")

and I tried to create a column in the dataframe (df) called "dollars"

df2['dollars']=df2['description'].str.extract(rx)

It works for the most part, except for values like .11 or .00, in which case nan is returned. How do I revise this expression to make it work for cents without leading dollars?

Help with this is greatly appreciated!

string                                                       dollars
Transfer to Retirement Rsvs-MA FX                   .11      0.11
Opening Balance                FX        342,536,002.63      342,536,002.63
VA                    85.85                                  85.85
VB                   .00                                     .00
Manual Adjustment FX 6,838.36-                               6,836-

Upvotes: 3

Views: 666

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You may use

r'\$?(?<!\d)(?:\d{1,3}(?:,\d{3})*|\d{4,})?\.?\d+'

See the regex demo

Details

  • \$? - an optional $ char
  • (?<!\d) - make sure there is no digit immediately to the left
  • (?:\d{1,3}(?:,\d{3})*|\d{4,})? - either of the two patterns:
    • \d{1,3}(?:,\d{3})* - 1 to 3 digits followed with 0 or more occurrences of a comma and three digits
    • | - or
    • \d{4,} - four or more digits
  • \.? - an optional dot
  • \d+ - 1+ digits.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163372

You might use:

(?<!\S)\$?(?:\d{1,3}(?:\,\d{3})*)?\.\d{2}-?(?!\S)
  • (?<!\S) Whitespace boundary on the left
  • \$? Optional dollar sign
  • (?:\d{1,3}(?:\,\d{3})*)? Optional part matching 1-3 digits optionally repeated by comma and 3 digits
  • \.\d{2} Match a . and 2 digits
  • -? Optional hyphen
  • (?!\S) Whitespace boundary on the right

Regex demo

If you want 1+ digits after the dot, change \.\d{2} to \.\d+

Upvotes: 1

Related Questions