ppflrs
ppflrs

Reputation: 311

Extract fields and values from string in Python

I'm trying to extract the field name and the value.From a string containing fields and values like the following one:

/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV

The expected result is something like:

['location','(7966, 8580, 1)','station','NY','comment','Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

So far I've been able to extract the field names using:

>> re.findall(r"\/([a-z]*?)\=",string)
['location', 'station', 'comment']

And I've tried to use negative ?! without success.

Thanks in advance!

Upvotes: 2

Views: 2154

Answers (2)

Lee HoYo
Lee HoYo

Reputation: 1267

Just use the re.split()

>>> string
'/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV'
>>> import re
>>> pattern = re.compile(r'\s*/([a-z]+)=')
>>> pattern.split(string)[1:]
['location', '(7966, 8580, 1)', 'station', 'NY', 'comment', 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Upvotes: 1

alecxe
alecxe

Reputation: 473863

You can use re.split() to first split the "key=value" pairs, then regular str.split() splitting by the first occurrence of =:

>>> dict(item.split("=", 1) for item in re.split(r"\s*/(?=[a-z]*?\=)", s)[1:])
{
  'comment': 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV', 
  'station': 'NY', 
  'location': '(7966, 8580, 1)' 
}

Upvotes: 3

Related Questions