Reputation: 666
I have this trouble when opening a big page about 82,000 lines with Python requests lib. Before I tried to use urllib2, but here was an error "IncompleteRead"
Now with requests:
r = requests.get(https://www.bhphotovideo.com/c/search?atclk=Model+Year_2016&Ns=p_PRICE_2|0&ci=13223&ipp=120&N=4110474291+4294948825+3665082495)
page_source = r.content
print page_source
The result of source print is not full, I see the end tag, but not from start of the document!
Have you any idea, how to load full content of this URL? 82000 lines.
Upvotes: 0
Views: 195
Reputation: 81614
Most (if not all) shells have a characters limit. Saving page_source
to a file confirms that requests.get
returns the whole page:
import requests
r = requests.get('https://www.bhphotovideo.com/c/search?atclk=Model+Year_2016&Ns=p_PRICE_2|0&ci=13223&ipp=120&N=4110474291+4294948825+3665082495')
page_source = r.text
with open('test.txt', 'w') as f:
f.write(page_source.strip())
The file contents starts with <!DOCTYPE html>
which is the start of the page.
Also note I'm using .text
instead of .content
to get a cleaner representation of the page source. I also used .strip()
because this page's source starts with useless '\n'
for some reason.
Another approach would be to simply print the first 100 (or whatever) characters of page_source
:
print(page_source[:100])
# <!DOCTYPE html>
# <!--[if lt IE 7]> <html class="ie lt-ie7"> <![endif]-->
# <!--[if IE 7]>
Upvotes: 2