Reputation: 21
I am doing a sentiment analysis and I want to Add NOT to every word between negation and following punctuation. I am performing the following code:
import re
fin=open("aboveE1.txt",'r', encoding='UTF-8')
transformed = re.sub(r'\b(?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)\b[\w\s]+[^\w\s]',
lambda match: re.sub(r'(\s+)(\w+)', r'\1NEG_\2', match.group(0)),
fin,
flags=re.IGNORECASE)
Traceback (most recent call last): line 14, in flags=re.IGNORECASE) line 182, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or bytes-like object
I dont know how to fix the error. Can you help me?
Upvotes: 1
Views: 1338
Reputation: 2798
re.sub
takes in a string, not a file object. Documentation here.
import re
fin=open("aboveE1.txt",'r', encoding='UTF-8')
transformed = ''
for line in fin:
transformed += re.sub(r'\b(?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)\b[\w\s]+[^\w\s]',
lambda match: re.sub(r'(\s+)(\w+)', r'\1NEG_\2', match.group(0)),
line,
flags=re.IGNORECASE)
# No need to append '\n' to 'transformed'
# because the line returned via the iterator includes the '\n'
fin.close()
Also remember to always close the file you open.
Upvotes: 1