Reputation: 311
I'm trying to extract the field name and the value.From a string containing fields and values like the following one:
/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV
Each string can contain a different number of fields
The field names will always be enclosed between '/' and '='
The values can contain '/' and whitespace but not '='
The expected result is something like:
['location','(7966, 8580, 1)','station','NY','comment','Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']
So far I've been able to extract the field names using:
>> re.findall(r"\/([a-z]*?)\=",string)
['location', 'station', 'comment']
And I've tried to use negative ?!
without success.
Thanks in advance!
Upvotes: 2
Views: 2154
Reputation: 1267
Just use the re.split()
>>> string
'/location=(7966, 8580, 1) /station=NY /comment=Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV'
>>> import re
>>> pattern = re.compile(r'\s*/([a-z]+)=')
>>> pattern.split(string)[1:]
['location', '(7966, 8580, 1)', 'station', 'NY', 'comment', 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV']
re.split(pattern, string, maxsplit=0, flags=0)
Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
Upvotes: 1
Reputation: 473863
You can use re.split()
to first split the "key=value" pairs, then regular str.split()
splitting by the first occurrence of =
:
>>> dict(item.split("=", 1) for item in re.split(r"\s*/(?=[a-z]*?\=)", s)[1:])
{
'comment': 'Protein RadB n=1 Tax=M (SB / ATCC) RepID=A6USB2_METV',
'station': 'NY',
'location': '(7966, 8580, 1)'
}
Upvotes: 3