Extract fields and values from string in Python

Question

I'm trying to extract the field name and the value.From a string containing fields and values like the following one:

/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV

Each string can contain a different number of fields
The field names will always be enclosed between '/' and '='
The values can contain '/' and whitespace but not '='

The expected result is something like:

['location','(7966, 8580, 1)','station','NY','comment','Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

So far I've been able to extract the field names using:

>> re.findall(r"\/([a-z]*?)\=",string)
['location', 'station', 'comment']

And I've tried to use negative ?! without success.

Thanks in advance!

Lee HoYo · Accepted Answer

Just use the re.split()

>>> string
'/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV'
>>> import re
>>> pattern = re.compile(r'\s*/([a-z]+)=')
>>> pattern.split(string)[1:]
['location', '(7966, 8580, 1)', 'station', 'NY', 'comment', 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Extract fields and values from string in Python

Answers (2)

Related Questions