Reputation: 1049
This question has been asked before but I am still not able to make this work entirely. I have the following examples of strings:
"Transfer to Retirement Rsvs-MA FX .11"
"Opening Balance FX 342,536,002.63"
"VA 85.85"
"VB .00"
"Manual Adjustment FX 6,838.36-"
I would like to extract the US dollar/cents amount from the strings into a separate column of a dataframe. I have the following regex expression:
rx = (r"(\$?(?:\d+,)*\d+\.\d+\-?)")
and I tried to create a column in the dataframe (df) called "dollars"
df2['dollars']=df2['description'].str.extract(rx)
It works for the most part, except for values like .11 or .00, in which case nan is returned. How do I revise this expression to make it work for cents without leading dollars?
Help with this is greatly appreciated!
string dollars
Transfer to Retirement Rsvs-MA FX .11 0.11
Opening Balance FX 342,536,002.63 342,536,002.63
VA 85.85 85.85
VB .00 .00
Manual Adjustment FX 6,838.36- 6,836-
Upvotes: 3
Views: 666
Reputation: 626950
You may use
r'\$?(?<!\d)(?:\d{1,3}(?:,\d{3})*|\d{4,})?\.?\d+'
See the regex demo
Details
\$?
- an optional $
char(?<!\d)
- make sure there is no digit immediately to the left(?:\d{1,3}(?:,\d{3})*|\d{4,})?
- either of the two patterns:
\d{1,3}(?:,\d{3})*
- 1 to 3 digits followed with 0 or more occurrences of a comma and three digits|
- or\d{4,}
- four or more digits\.?
- an optional dot\d+
- 1+ digits.Upvotes: 1
Reputation: 163372
You might use:
(?<!\S)\$?(?:\d{1,3}(?:\,\d{3})*)?\.\d{2}-?(?!\S)
(?<!\S)
Whitespace boundary on the left\$?
Optional dollar sign(?:\d{1,3}(?:\,\d{3})*)?
Optional part matching 1-3 digits optionally repeated by comma and 3 digits\.\d{2}
Match a .
and 2 digits-?
Optional hyphen(?!\S)
Whitespace boundary on the rightIf you want 1+ digits after the dot, change \.\d{2}
to \.\d+
Upvotes: 1