metalups
metalups

Reputation: 3

Python BeautifulSoup doesn't works on URL

I'm happy to join Stack Overflow :) First time i don't find an answer at my problem :)

I would like to scrap "meta description" on url list (in a SQL data base).

When I started my script, it gets "killed" without any error. It gets killed reading the 11th URL.

I made some tests, and I identified an URL : "http://www.les-calories.com/famille-4.html"

So i made this test, reducing my code at minimum :

# encoding=utf8 
from bs4 import BeautifulSoup
import urllib
html = urllib.urlopen(" http://www.les-calories.com/famille-4.html").read()
soup = BeautifulSoup(html)

And this code gets "killed" by the shell.

screen

I don't understand why...

Thank you for your help :)

Upvotes: 0

Views: 397

Answers (1)

Nafiul Islam
Nafiul Islam

Reputation: 82600

It could be that you've not specified the parser in which case do the following.

soup = BeautifulSoup(html, "html.parser")

However, I think what is more likely is that there was just too much information in the HTML page. What I'd do is use the python-requests package, and in the GET request, I'd set stream to True. Like so:

>>> import requests
>>> resp = requests.get("http://www.les-calories.com/famille-4.html", stream=True)
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(resp.text, "html.parser")
>>> soup.find("a")
<a href="http://www.fitadium.com/79-seche-et-definition-musculaire" target="_blank"><img border="0" height="60px" src="h
ttp://www.les-calories.com/images/234x60_pack-minceur-brule-graisses.gif" width="234px"/></a>

Upvotes: 1

Related Questions