Reputation: 11
I have a string containing many words. I need to extract specific part from it. Below is the details:
Suppose, I have following string:
x = "I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan tagore bvfvhbvbv amartya vfvbvbvfhv amartya"
Now I want to extract the content between amartya
and tagore
but that should exactly be 'mohan'
i.e., the question of the occurrence is coming into picture. Ihave used regexp but that gave me content as below:
"ccccc amartya xxxxx amartya yyyyy amartya mohan"
, but I want only 'mohan'
as my o/p.
Upvotes: 0
Views: 77
Reputation: 19431
This regular expression works for your specific example:
r = re.search("(amartya)(?!.*amartya.*tagore)(.*)(tagore)", x)
r.group(2).strip()
It basically says: match a pattern starting with "amartya" and ending with "tagore" and anything between them doesn't contain the word "amartya" again.
The second group is the (.*)
which matches anything between "amartya" and "tagore"
From the docs (re):
(?!...)
Matches if
...
doesn’t match next. This is a negative lookahead assertion. For example,Isaac (?!Asimov)
will match'Isaac '
only if it’s not followed by'Asimov'
.
Hope that helps.
Upvotes: 2
Reputation: 647
in this case you could start splitting at "tagore" afterwards split "amartya" and catch the last piece of string:
x = "I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan tagore bvfvhbvbv amartya vfvbvbvfhv amartya"
x1 = x.split('tagore')[0]
print(x1)
#I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan
x2 = x1.split('amartya')[-1]
print(x2.strip(" "))
#mohan
Upvotes: 1