Reputation: 229
I have a text file that contains security name, $ amounts, and % of the portfolio. I'm trying to figure out how to separate the companies using regex. I had an original solution that allowed me to .split('%')
and then create the 3 variables I needed, but I discovered some of the securities contain %
in their name and thus the solution was inadequate.
String example:
Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%
Current regex
[a-zA-Z0-9,$.\s]+[.0-9%]$
My current regex only finds the last company. example, Eaton Corp. PLC$53,087,8430.047%
Any help on how I can find every single instance of a company?
Solution desired
["Pinterest, Inc. Series F, 8.00%$24,808,9320.022%","ResMed,Inc.$23,495,3260.021%","Eaton Corp. PLC$53,087,8430.047%"]
Upvotes: 2
Views: 109
Reputation: 887
A working solution for Python, with named groups: https://regex101.com/r/sqkFaN/2
(?P<item>(?P<name>.*?)\$(?P<usd>[\d,\.]*?%))
At the link I provided you can see changes have effect in real-time, and the sidebar provides an explanation for the used syntax.
Upvotes: 1
Reputation: 17051
In Python 3:
import re
p = re.compile(r'[^$]+\$[^%]+%')
p.findall('Pinterest, Inc. Series F, 8.00%$24,808,9320.022%ResMed,Inc.$23,495,3260.021%Eaton Corp. PLC$53,087,8430.047%')
Result:
['Pinterest, Inc. Series F, 8.00%$24,808,9320.022%', 'ResMed,Inc.$23,495,3260.021%', 'Eaton Corp. PLC$53,087,8430.047%']
Your initial issue was that the $
anchor made the regex only match at the end of the line. However, removing the $
still split Pinterest into two entries at the %
after 8.00
.
To fix that, the regex looks for a $
, then a %
after that, and takes everything up through the %
as an entry. That pattern works for the examples you gave, but, of course, I can't know if it holds true for all your data.
Edit The regex works like this:
r' Use a raw string so you don't have to double the backslashes
[^$]+ Look for anything up to the next $
\$ Match the $ itself (\$ because $ alone means end-of-line)
[^%]+ Now anything up to the next %
% And the % itself
' End of the string
Upvotes: 3