Reputation: 21
I have a text document which i read and is very long one. Each of the text document is converted into one single string. These documents have labels in them with emotions and have to extract all of it. I am having a problem with re.findall function in python, which works perfectly fine when used with strings but my string contains special characters which is why i am getting nothing. For Ex:
string = ['yeah i\'ll get her going and and after you\'re done with your survey and stuff (00)\n<TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM">oh</TRIGGER> okay (01)\n<TRIGGER AFFECT="CONFUSED" SCALE="LOW">okay</TRIGGER> ]
I have to find all the word/words between the <TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM">
and </TRIGGER>
.
match = re.findall("<TRIGGER AFFECT="SURPRISED" SCALE="MEDIUM"> (.*?) </TRIGGER>",i)
print (match)
it is not working, i guess it has something to do with the <
and "
, if i use the same code with any other normal string instead it works
Upvotes: 2
Views: 58