Reputation: 515
I have the following code and would like to make it compatible with both python 2.7 and python 3.6
from re import sub, findall
return sub(r' ', ' ', sub(r'(\s){2,}', ' ',sub(r'[^a-z|\s|,]|_|
(x)\1{1,}', '', x.lower())))
I received the following error: TypeError: cannot use a string pattern on a bytes-like object
I understood that the python3 distinguishes byte and string(unicode),but not sure how to proceed.
Thanks.
tried the following and not working
return sub(rb' ', b' ', sub(rb'(\s){2,}', b' ',sub(rb'[^a-z|\s|,]|_|(x)\1{1,}', b'', x.lower())))
Upvotes: 2
Views: 3437
Reputation: 437
Have you tried using re.findall? For instance:
import re
respdata = # the data you are reading
content = re.findall(r'#findall from and too#', str(respdata)) # output in string
for contents in content:
print(contents) # print results
Upvotes: 1
Reputation: 1568
The "string" you have must be a series of bytes, which you can convert to a real string using x.decode('utf-8')
. You can see the problem with a simple example:
>>> import re
>>> s = bytes('hello', 'utf-8')
>>> s
b'hello'
>>> re.search(r'[he]', s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object
>>> s.decode('utf-8')
'hello'
>>> re.search(r'[he]', s.decode('utf-8'))
<re.Match object; span=(0, 1), match='h'>
I'm assuming your bytes represent UTF-8 data, but if you're working with a different encoding then just pass its name to decode()
instead.
Upvotes: 0