neversaint
neversaint

Reputation: 63994

Extracting entries from a line into a list in Python Regex

I have the following string:

myst="Cluster 2 0     13aa,>FZGRY:07872:11201...*1    13aa,>FZGRY:08793:13012...at100.00%2    13aa,>FZGRY:04065:08067...at100.00%"

What I want to do is to extract content bounded by > and .... into a list. yielding:

['FZGRY:07872:11201','FZGRY:08793:13012', 'FZGRY:04065:08067']

But why this line doesn't do the job:

import re
mem = re.findall(">(.*)\.\.\.",myst)
mem

What's the right way to do it?

Upvotes: 0

Views: 39

Answers (1)

nu11p01n73R
nu11p01n73R

Reputation: 26667

You can use look arounds to do this.

>>> re.findall(r'(?<=>)[^.]+(?=[.]{3})', myst)
['FZGRY:07872:11201', 'FZGRY:08793:13012', 'FZGRY:04065:08067']

Regex

  • (?<=>) Positive look behind. Checks if the string is preceded by >

  • [^.]+ Matches anything other than ., + matches one or more.

  • (?=[.]{3}) Positive look ahead. Check if the matched string is followed by ...

What is wrong with your regex?

  • >(.*)\.\.\. Here the .* is greedy and will try to match as much as possible. Add a ? at the end to make it non greedy.

    >>> re.findall(">(.*?)\.\.\.",myst)
    ['FZGRY:07872:11201', 'FZGRY:08793:13012', 'FZGRY:04065:08067']
    

Upvotes: 3

Related Questions