Extracting US dollar amount

Question

This question has been asked before but I am still not able to make this work entirely. I have the following examples of strings:

"Transfer to Retirement Rsvs-MA FX                   .11"                
"Opening Balance                FX        342,536,002.63"     
"VA                 85.85"               
"VB                   .00"     
"Manual Adjustment              FX              6,838.36-"

I would like to extract the US dollar/cents amount from the strings into a separate column of a dataframe. I have the following regex expression:

rx = (r"(\$?(?:\d+,)*\d+\.\d+\-?)")

and I tried to create a column in the dataframe (df) called "dollars"

df2['dollars']=df2['description'].str.extract(rx)

It works for the most part, except for values like .11 or .00, in which case nan is returned. How do I revise this expression to make it work for cents without leading dollars?

Help with this is greatly appreciated!

string                                                       dollars
Transfer to Retirement Rsvs-MA FX                   .11      0.11
Opening Balance                FX        342,536,002.63      342,536,002.63
VA                    85.85                                  85.85
VB                   .00                                     .00
Manual Adjustment FX 6,838.36-                               6,836-

Wiktor Stribiżew · Accepted Answer

You may use

r'\$?(?



See the regex demo

Details


\$? - an optional $ char
(? - make sure there is no digit immediately to the left

(?:\d{1,3}(?:,\d{3})*|\d{4,})? - either of the two patterns:


\d{1,3}(?:,\d{3})* - 1 to 3 digits followed with 0 or more occurrences of a comma and three digits
| - or
\d{4,} - four or more digits

\.? - an optional dot
\d+ - 1+ digits.

Extracting US dollar amount

Answers (2)

Related Questions