superdee
superdee

Reputation: 697

Scrapy returning weirdly encoded string

I'm using scrapy and getting a weird response. The url looks like this (notice the utf-8 encoded check mark: https://www.example.com?sort=relevancy&utf8=%E2%9C%9

I'm getting a 200 response but the string is bytes looking like this:

b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec\xbd\xedv\xdb\xb6\xb20\xfc?W\x81r\x9f\'\xb6OE\x8a\....

What is this? How do I handle this? Can I have scrapy automatically decode stuff that looks like this?

Upvotes: 3

Views: 1329

Answers (1)

Way Too Simple
Way Too Simple

Reputation: 305

The answer is on the @drec4s and @furas comments.

You can try first to decode the response

response.body.decode('utf-8')

Or also

response.body_as_unicode()

If you get decoding errors or an unreadable string you might try different encodings, but most likely the response's body is compressed. Check in the response headers for something like

content-encoding: br

Or it could also be 'gzip'

In that case, you need to ask the server to return an uncompressed response by setting in the request headers:

accept-encoding: deflate

Upvotes: 2

Related Questions