TheRutubeify
TheRutubeify

Reputation: 666

python requests loading big page solution

I have this trouble when opening a big page about 82,000 lines with Python requests lib. Before I tried to use urllib2, but here was an error "IncompleteRead"

Now with requests:

 r = requests.get(https://www.bhphotovideo.com/c/search?atclk=Model+Year_2016&Ns=p_PRICE_2|0&ci=13223&ipp=120&N=4110474291+4294948825+3665082495)
 page_source = r.content
 print page_source

The result of source print is not full, I see the end tag, but not from start of the document!

Have you any idea, how to load full content of this URL? 82000 lines.

Upvotes: 0

Views: 195

Answers (1)

DeepSpace
DeepSpace

Reputation: 81614

Most (if not all) shells have a characters limit. Saving page_source to a file confirms that requests.get returns the whole page:

import requests

r = requests.get('https://www.bhphotovideo.com/c/search?atclk=Model+Year_2016&Ns=p_PRICE_2|0&ci=13223&ipp=120&N=4110474291+4294948825+3665082495')
page_source = r.text
with open('test.txt', 'w') as f:
    f.write(page_source.strip())

The file contents starts with <!DOCTYPE html> which is the start of the page. Also note I'm using .text instead of .content to get a cleaner representation of the page source. I also used .strip() because this page's source starts with useless '\n' for some reason.

Another approach would be to simply print the first 100 (or whatever) characters of page_source:

print(page_source[:100])
# <!DOCTYPE html>
# <!--[if lt IE 7]>      <html class="ie lt-ie7"> <![endif]-->
# <!--[if IE 7]>   

Upvotes: 2

Related Questions