Python BeautifulSoup doesn't works on URL

Question

I'm happy to join Stack Overflow :) First time i don't find an answer at my problem :)

I would like to scrap "meta description" on url list (in a SQL data base).

When I started my script, it gets "killed" without any error. It gets killed reading the 11th URL.

I made some tests, and I identified an URL : "http://www.les-calories.com/famille-4.html"

So i made this test, reducing my code at minimum :

# encoding=utf8 
from bs4 import BeautifulSoup
import urllib
html = urllib.urlopen(" http://www.les-calories.com/famille-4.html").read()
soup = BeautifulSoup(html)

And this code gets "killed" by the shell.

screen

I don't understand why...

Thank you for your help :)

Nafiul Islam · Accepted Answer

It could be that you've not specified the parser in which case do the following.

soup = BeautifulSoup(html, "html.parser")

However, I think what is more likely is that there was just too much information in the HTML page. What I'd do is use the python-requests package, and in the GET request, I'd set stream to True. Like so:

>>> import requests
>>> resp = requests.get("http://www.les-calories.com/famille-4.html", stream=True)
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(resp.text, "html.parser")
>>> soup.find("a")

Python BeautifulSoup doesn't works on URL

Answers (1)

Related Questions

Python BeautifulSoup doesn&#39;t works on URL

Answers (1)

Related Questions

Python BeautifulSoup doesn't works on URL