Jay Lord
Jay Lord

Reputation: 57

Using regular expressions to match the first pattern occurrence in each line

Using regular expressions with Python, attempting to parse through the data below.

The Data thing1 =

<a class="screener-link-primary" href="quote.ashx?t=IDXG&amp;ty=c&amp;p=d&amp;b=1">IDXG</a>, 
<a class="screener-link-primary" href="quote.ashx?t=INVN&amp;ty=c&amp;p=d&amp;b=1">INVN</a>, 
<a class="screener-link-primary" href="quote.ashx?t=SWC&amp;ty=c&amp;p=d&amp;b=1">SWC</a>, 
<a class="screener-link-primary" href="quote.ashx?t=NE&amp;ty=c&amp;p=d&amp;b=1">NE</a>, 

The regular expression

pattern = "[A-Z][A-Z]{1,5}(?![A-Z])"
match = re.findall(pattern,thing1)
print(match)

The result I get is the two occurrences in every line.

['IDXG', 'IDXG', 'INVN', 'INVN', 'SWC', 'SWC', 'NE', 'NE']

The result I want is only the first occurrence that matches the pattern in each line.

['IDXG', 'INVN', 'SWC', 'NE']

I know that if I remove the global tag, it stops after one match.

And if I do each line separately, it'll give me the first match.

Is there an elegant way to get the first occurrence of each line in Python?

Upvotes: 2

Views: 219

Answers (2)

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10476

Just added a < to your second link:

[A-Z]{1,5}(?![A-Z<])

Your updated link

Upvotes: 1

Rohan Amrute
Rohan Amrute

Reputation: 774

try this regex:

pattern = "([A-Z][A-Z]{1,5}(?!\&)).*\n"
match = re.findall(pattern,thing1)

Upvotes: 0

Related Questions