Reputation: 123
I have a text file and want to extract texts between two strings ("StartString" and "EndString" in below example) if a substring exists between those two strings. There may be multiple such instances in the text file. After this, I want to extract first occurrence of the "someID" in those multiple instances. For example,
Example text file (text_data)
ghguy hja StartString I want this text (1) if substring 1 lies in between the two strings someID: abcd_efgh
ghsjgsajhgj someID: dgfshgj
EndString bhghk [jhbn] xxzh StartString I want this text (2) as a different variable if substring 2 lies in between the two strings ghdsdjsagdsh someID: fhcb7hkhb
ghjxcgsydgsdycgsjxcskcsal someID: ghyoet_fstj
EndString ghjyjgu
Output:
first_variable = I want this text (1) if substring 1 lies in between the two strings someID: abcd_efgh ghsjgsajhgj someID: dgfshgj
second_variable = I want this text (2) as a different variable if substring 2 lies in between the two strings ghdsdjsagdsh someID: fhcb7hkhb ghjxcgsydgsdycgsjxcskcsal someID: ghyoet_fstj
first occurrence of someID in first_variable = abcd_efgh
first occurrence of someID in first_variable = fhcb7hkhb
I tried extracting the first variable as:
target1 = 'StartString'
target2 = 'EndString'
pat1 = '{}(.+?){}'.format(target1,target2)
pattern = re.compile(pat1, flags=re.DOTALL)
first_variable = pattern.findall(text_data)
I have no clue how to extract first occurrence of the someID in each instance. Can anyone help me out in this.
Upvotes: 1
Views: 1398