Reputation: 7864
I have the below script running on my Ubuntu Server 15.04 VPS and it works perfectly. I'm tweaking it to run on my Raspberry Pi (fully updated Raspbian Wheezy), but BeautifulSoup4 isn't detecting the page elements like it does on the VPS. The code and traceback are below. Why is this error happening on my Pi but not on my VPS?
Here's the relevant piece of code. Among other things, os
, BeautifulSoup
(from bs4
), and requests
are imported. Lines 5 and below are inside a loop (the actual script loops over a dictionary to check all of the devices); I've verified that a) the commands below are what are actually running, and b) running the exact same code works on the VPS (data is returned) but not on the Pi (throws an error).
page = requests.get('https://developers.google.com/android/nexus/images')
soup = BeautifulSoup(page.text)
# loop starts here
cur = "/var/www/nexus_temp/shamu.html"
try:
os.remove(cur)
except OSError:
pass
g = open(cur, 'wb')
data = str(soup.select("h2#shamu ~ table")[0])
g.write(data)
g.close()
Traceback:
Traceback (most recent call last):
File "./nimages.py", line 40, in <module>
data = str(soup.select("h2#shamu ~ table")[0])
IndexError: list index out of range
Running the script from the Python command line and doing print soup.select("h2#shamu ~ table")
just returns []
, but print soup.find_all('h2')
returns all of the <h2>
elements on the page, as it should. Printing page.text
does return the full page source code, as does soup.prettify()
.
Upvotes: 1
Views: 175
Reputation: 5537
It might be a version issue, with regards to the version of Python used. You could try Scrapy, using the HtmlXPathSelector you should bee able to make it work [Scrapy works on Python2.7] ,I've made Scrapy work on RPi so I can guarantee that will work.
Upvotes: 0