user10256551
user10256551

Reputation: 31

python how to extract the text between two known words in a string?

how to extract the text between two known words in a string with a condition that the text between these words can be i) 1 character ii) 1 word iii) 2 words etc.?

Sample Text:

text = ("MNOTES - GEO GEO MNOTES 20 231-0005 GEO GEO GEO GEO GEO MNOTES SOME REVISION MNOTES CASUAL C GEO GEO GEO GEO GEO MNOTES F232322500 MNOTES HELP PAGES GEO GEO GEO GEO MNOTES SHEET 1 OF 3 GEO GEO MNOTES CASUAL E. GEO GEO MNOTES SITPOPE/TIN AY GEO GEO MNOTES R GEO GEO GEO GEO MNOTES 22+0436/T.SKI/11-AUG-1986 GEO GEO GEO GEO MNOTES 231-0045 GEO")

I have a string like above that have multiple occurrences of these two known words 'MNOTES' and 'GEO', however the text between them can be anything and any number of words.

I wanted to extract sometimes the text that has only one character between those two known words or sometimes the text that has 2 words between those two known words or sometimes the text that has 6 words between those two known words etc., So, how can i extract along with the condition ?

Upvotes: 3

Views: 4610

Answers (1)

Jab
Jab

Reputation: 27515

Use re.findall.

import re

re.findall('MNOTES(.*?)GEO', text)

This results in:

[' - ', ' 20 231-0005 ', ' SOME REVISION MNOTES CASUAL C ', ' F232322500 MNOTES HELP PAGES ', ' SHEET 1 OF 3 ', ' CASUAL E. ', ' SITPOPE/TIN AY ', ' R ', ' 22+0436/T.SKI/11-AUG-1986 ', ' 231-0045 ']

Edit

To get a specific amount of characters the following will work:

re.findall('MNOTES\s?(.{1})\s?GEO', text)

Results in

['-', 'R']

and to get only results that are 6-8 characters long:

re.findall('MNOTES\s?(.{6,8})\s?GEO', text)

Results:

['- GEO ', 'CASUAL C', 'R GEO ', '231-0045']

Upvotes: 4

Related Questions