tudou
tudou

Reputation: 515

Issue with python 2 to python 3 TypeError: cannot use a string pattern on a bytes-like object

I have the following code and would like to make it compatible with both python 2.7 and python 3.6

from re import sub, findall

return sub(r'  ', ' ', sub(r'(\s){2,}', ' ',sub(r'[^a-z|\s|,]|_| 
(x)\1{1,}', '', x.lower())))

I received the following error: TypeError: cannot use a string pattern on a bytes-like object

I understood that the python3 distinguishes byte and string(unicode),but not sure how to proceed.

Thanks.

tried the following and not working

return sub(rb'  ', b' ', sub(rb'(\s){2,}', b' ',sub(rb'[^a-z|\s|,]|_|(x)\1{1,}', b'', x.lower())))

Upvotes: 2

Views: 3437

Answers (2)

Barb
Barb

Reputation: 437

Have you tried using re.findall? For instance:

import re

respdata =      # the data you are reading

content = re.findall(r'#findall from and too#', str(respdata))    # output in string
for contents in content:
    print(contents)    # print results

Upvotes: 1

tar
tar

Reputation: 1568

The "string" you have must be a series of bytes, which you can convert to a real string using x.decode('utf-8'). You can see the problem with a simple example:

>>> import re
>>> s = bytes('hello', 'utf-8')
>>> s
b'hello'
>>> re.search(r'[he]', s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object
>>> s.decode('utf-8')
'hello'
>>> re.search(r'[he]', s.decode('utf-8'))
<re.Match object; span=(0, 1), match='h'>

I'm assuming your bytes represent UTF-8 data, but if you're working with a different encoding then just pass its name to decode() instead.

Upvotes: 0

Related Questions