Volatil3
Volatil3

Reputation: 14978

Python requests: The html of page is not visible in text format

I am trying to access a page and it's html looks like:

?2?pɢ???=???I????܉??s????   [??AX#?`s??5???2`?| ,q?ɲ?=h?}VTŬ~?Y?}u3cx?pȢ?K_Ol&ɡ??'N??Y??n5?890??G???&$?%J#?ܩ?ѡ
1?y???
$]    &'ι?\?~T?=??@N?C?$??K? ??iu"T?M
  ?6>?&5?:??sJ???xi???V??N??????3R7u??ǹ??7qs??<*????????@3?
EWu}??'F??Z??߶O?????Fc۰?S???h??/????h???[kS(                        f?\˹?@e???7_~~??*'?Jq??i?͛?J?W?T?Y]S??ӫ?~??k՘H??
w?L??ws??M?h?V?؊<[ ?
??A?G?w?

What's that? is it some encoding/decoding thing? how to view the html?

The code is here:

import requests
from bs4 import BeautifulSoup
import json



headers_initial = {
        'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9',
        'cache-control': 'no-cache',
        'upgrade-insecure-requests': '1',
    }    
r = requests.get('https://www.example.com/', headers=headers_initial)
        if r.status_code == 200:
            html = r.text.strip()
            print(html)

Upvotes: 0

Views: 221

Answers (1)

Derlin
Derlin

Reputation: 9881

The problem comes from your headers. Just remove the accept-encoding and it should work fine.

edit: the accept-encoding specifies if we can handle compressed data. requests doesn't, so if you need to specify the header, use the identity property, meaning "just send me the page without compression".

Upvotes: 2

Related Questions