Reputation: 11

How to extract a specific part from text

I have a string containing many words. I need to extract specific part from it. Below is the details:

Suppose, I have following string:

x = "I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan tagore bvfvhbvbv amartya vfvbvbvfhv amartya"

Now I want to extract the content between amartya and tagore but that should exactly be 'mohan' i.e., the question of the occurrence is coming into picture. Ihave used regexp but that gave me content as below: "ccccc amartya xxxxx amartya yyyyy amartya mohan", but I want only 'mohan' as my o/p.

Upvotes: 0

Answers (2)

Tomerikoo

Reputation: 19431

This regular expression works for your specific example:

r = re.search("(amartya)(?!.*amartya.*tagore)(.*)(tagore)", x)
r.group(2).strip()

It basically says: match a pattern starting with "amartya" and ending with "tagore" and anything between them doesn't contain the word "amartya" again.

The second group is the (.*) which matches anything between "amartya" and "tagore"

From the docs (re):

(?!...)

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

Hope that helps.

Upvotes: 2

Mig B

Reputation: 647

in this case you could start splitting at "tagore" afterwards split "amartya" and catch the last piece of string:

x = "I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan tagore bvfvhbvbv amartya vfvbvbvfhv amartya"

x1 = x.split('tagore')[0]
print(x1)
#I am amartya ccccc amartya xxxxx amartya yyyyy amartya mohan 
x2 = x1.split('amartya')[-1]
print(x2.strip(" "))
#mohan

Upvotes: 1

How to extract a specific part from text

Answers (2)

Related Questions