Reputation: 697
I'm using scrapy and getting a weird response. The url looks like this (notice the utf-8 encoded check mark: https://www.example.com?sort=relevancy&utf8=%E2%9C%9
I'm getting a 200 response but the string is bytes looking like this:
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec\xbd\xedv\xdb\xb6\xb20\xfc?W\x81r\x9f\'\xb6OE\x8a\....
What is this? How do I handle this? Can I have scrapy automatically decode stuff that looks like this?
Upvotes: 3
Views: 1329
Reputation: 305
The answer is on the @drec4s and @furas comments.
You can try first to decode the response
response.body.decode('utf-8')
Or also
response.body_as_unicode()
If you get decoding errors or an unreadable string you might try different encodings, but most likely the response's body is compressed. Check in the response headers for something like
content-encoding: br
Or it could also be 'gzip'
In that case, you need to ask the server to return an uncompressed response by setting in the request headers:
accept-encoding: deflate
Upvotes: 2