Siddharth Sah
Siddharth Sah

Reputation: 21

finding string betweenn 2 substrings

I have a text document which i read and is very long one. Each of the text document is converted into one single string. These documents have labels in them with emotions and have to extract all of it. I am having a problem with re.findall function in python, which works perfectly fine when used with strings but my string contains special characters which is why i am getting nothing. For Ex:

string = ['yeah i\'ll get her going and and after you\'re done with your survey and stuff (00)\n<TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM">oh</TRIGGER> okay (01)\n<TRIGGER AFFECT="CONFUSED" SCALE="LOW">okay</TRIGGER> ]

I have to find all the word/words between the <TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM"> and </TRIGGER>.

match = re.findall("<TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM"> (.*?) </TRIGGER>",i)
print (match)

it is not working, i guess it has something to do with the < and " , if i use the same code with any other normal string instead it works

Upvotes: 2

Views: 58

Answers (1)

Barmar
Barmar

Reputation: 780974

Fix your quotes, and get rid of the spaces in the regexp that don't have a match in the input string. It's also usually a good idea to use a raw string for the regexp.

match = re.findall(r'<TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM">(.*?)</TRIGGER>', i)

DEMO

Upvotes: 4

Related Questions