noob
noob

Reputation: 3811

TypeError: expected string or bytes-like object Regular expression removing special characters

train dataframe with content column. content column has list for each row containing different words in that list.

content
[sure, tune, …, watch, donald, trump, “,”, late, ’ , night]
[abc, xyz, “,”,late, ’, night]

Code to remove regular expressions

import re
train['content'] = train['content'].map(lambda x: re.sub(r'\W+', '', x))

Error

TypeError: expected string or bytes-like object

Expected output

content
[sure, tune,  watch, donald, trump, late,   night]
[abc, xyz,late, night]

Notice all the special characters like ..., , and are gone and we are left only with words.

Upvotes: 0

Views: 1041

Answers (2)

ztepler
ztepler

Reputation: 460

You are trying to apply regular expression to the List object.

If your goal is to use this regex on every item of the list, you can apply re.sub for each item in list:

import re
def replace_func(item):
    return re.sub(r'\W+', '', item)

train['content'] = train['content'].map(lambda x: [replace_func(item) for item in x])

Upvotes: 1

grim
grim

Reputation: 9

Just do:

content=['sure', 'tune', '…', 'watch', 'donald', 'trump', '“,”', 'late', '’' , 'night']
content = list(map(lambda x: re.sub(r'\W+', '', x),content))

Upvotes: 0

Related Questions