Reputation: 3811
train dataframe with content column. content column has list for each row containing different words in that list.
content
[sure, tune, …, watch, donald, trump, “,”, late, ’ , night]
[abc, xyz, “,”,late, ’, night]
Code to remove regular expressions
import re
train['content'] = train['content'].map(lambda x: re.sub(r'\W+', '', x))
Error
TypeError: expected string or bytes-like object
Expected output
content
[sure, tune, watch, donald, trump, late, night]
[abc, xyz,late, night]
Notice all the special characters like ...
, “
, ”
and ’
are gone and we are left only with words.
Upvotes: 0
Views: 1041
Reputation: 460
You are trying to apply regular expression to the List object.
If your goal is to use this regex on every item of the list, you can apply re.sub for each item in list:
import re
def replace_func(item):
return re.sub(r'\W+', '', item)
train['content'] = train['content'].map(lambda x: [replace_func(item) for item in x])
Upvotes: 1
Reputation: 9
Just do:
content=['sure', 'tune', '…', 'watch', 'donald', 'trump', '“,”', 'late', '’' , 'night']
content = list(map(lambda x: re.sub(r'\W+', '', x),content))
Upvotes: 0