Reputation: 57
Using regular expressions with Python, attempting to parse through the data below.
The Data thing1 =
<a class="screener-link-primary" href="quote.ashx?t=IDXG&ty=c&p=d&b=1">IDXG</a>,
<a class="screener-link-primary" href="quote.ashx?t=INVN&ty=c&p=d&b=1">INVN</a>,
<a class="screener-link-primary" href="quote.ashx?t=SWC&ty=c&p=d&b=1">SWC</a>,
<a class="screener-link-primary" href="quote.ashx?t=NE&ty=c&p=d&b=1">NE</a>,
The regular expression
pattern = "[A-Z][A-Z]{1,5}(?![A-Z])"
match = re.findall(pattern,thing1)
print(match)
The result I get is the two occurrences in every line.
['IDXG', 'IDXG', 'INVN', 'INVN', 'SWC', 'SWC', 'NE', 'NE']
The result I want is only the first occurrence that matches the pattern in each line.
['IDXG', 'INVN', 'SWC', 'NE']
I know that if I remove the global tag, it stops after one match.
And if I do each line separately, it'll give me the first match.
Is there an elegant way to get the first occurrence of each line in Python?
Upvotes: 2
Views: 219
Reputation: 774
try this regex:
pattern = "([A-Z][A-Z]{1,5}(?!\&)).*\n"
match = re.findall(pattern,thing1)
Upvotes: 0