Reputation: 21
I am trying to tokenize tweet but I get the error: TypeError: expected string or bytes-like object
I am cleaning tweets for use in ml, so am carryout tokenization.
# remove twitter handles (@user)
def remove_pattern(input_txt, pattern):
r = re.findall(pattern, input_txt)
for i in r:
input_txt = re.sub(i, '', input_txt)
return input_txt
# remove twitter handles and create new column with clean tweet
data_df['cleaned_tweet'] = np.vectorize(remove_pattern)(data_df['text'], "@[\w]*")
Upvotes: 2
Views: 486
Reputation: 41
This is because the twitter text is not a string, it is an object, you have to convert object into string, write: input_txt =str(input_txt)
.
Upvotes: 4