Reputation: 197
I want to find elements between two different tags but the catch is the first tag is constant but the second tag can be any tag belonging to a particular list.
for example a string
'TRSF BOOK TRANSFER CREDIT SND= abcd bank , 123 ORG= qwer123 OGB= qwerasd OBI= 123433'
I have a list of tags ['TRSF','SND=','ORG=','OGB=','OBI=']
edit : added the availability of '=' in the list itself
My output should look some what like this
TRSF : BOOK TRANSFER CREDIT
SND : abcd bank , 123
ORG : qwer123
OGB : qwerasd
OBI : 123433
The order of tags, as well as the availability of the tags, may change also new tags may come into the picture
till now I was writing separate regex and string parsing code for each type but that seems impractical as the combination can be infinite
Here is what I was doing :
org = re.findall("ORG=(.*?) OGB=",string_1)
snd = re.findall("SND=(.*?) ORG=",string_1)
,,obi = string_1.partition('OBI=')
Is there any way to do it like
<tag>(.*?)<tag in list>
or any other method ?
Upvotes: 1
Views: 63
Reputation: 627488
If the tag list is complete, you can use a regex like
\b(TRSF|SND|ORG|OGB|OBI)\b=?\s*(.*?)(?=\s*\b(?:TRSF|SND|ORG|OGB|OBI)\b|\Z)
See the regex demo. Details:
\b
- a word boundary(TRSF|SND|ORG|OGB|OBI)
- a tag captured into Group 1\b
- a word boundary=?
- an optional =
\s*
- 0+ whitespaces(.*?)
- Group 2: any zero or more chars, as few as possible(?=\s*\b(?:TRSF|SND|ORG|OGB|OBI)\b|\Z)
- either end of string (\Z
) or zero or more whitespaces followed with a tag as a whole word.See the Python demo:
import re
s='TRSF BOOK TRANSFER CREDIT SND= abcd bank , 123 ORG= qwer123 OGB= qwerasd OBI= 123433'
tags = ['TRSF','SND','ORG','OGB','OBI']
print( dict(re.findall(fr'\b({"|".join(tags)})\b=?\s*(.*?)(?=\s*\b(?:{"|".join(tags)})\b|\Z)', s.strip(), re.DOTALL)) )
# => {'TRSF': 'BOOK TRANSFER CREDIT', 'SND': 'abcd bank , 123', 'ORG': 'qwer123', 'OGB': 'qwerasd', 'OBI': '123433'}
Note the re.DOTALL
(equal to re.S
) makes the .
match any chars including line break chars.
Upvotes: 1