Reputation: 711
I am trying to automate a process of downloading imgur files, and for this purpose I am using beautifulsoup to get the link however to be honest I am pretty lost on why this doesn't work, as according to my research it should:
soup = BeautifulSoup("http://imgur.com/ha0WYYQ")
imageUrl = soup.select('.image a')[0]['href']
The code above just returns an empty list, and therefore an error. I tried to modify it, but to no avail. Any and all input is appreciated.
Upvotes: 0
Views: 2013
Reputation: 477180
There are a few things wrong with your approach:
BeautifulSoup
does not expect an url, so you will need to use a library to fetch the HTML stream first; and.post-image a
.r = urllib.urlopen('http://imgur.com/ha0WYYQ').read()
soup = BeautifulSoup(r,'lxml')
soup.select('.post-image a')[0]['href']
Or more elegant:
with urllib.urlopen('http://imgur.com/ha0WYYQ') as f:
r = f.read()
soup = BeautifulSoup(r,'lxml')
result = soup.select('.post-image a')[0]['href']
Upvotes: 3
Reputation: 12168
<div class="post-image">
<a href="//i.imgur.com/ha0WYYQ.jpg" class="zoom">
<img src="//i.imgur.com/ha0WYYQ.jpg" alt="Frank in his bb8 costume" itemprop="contentURL">
</a>
</div>
this is the image tag, the "post-image"
is a single word, can not be separated.
imageUrl = soup.select('.post-image a')[0]['href']
shortcut for select one tag:
imageUrl = soup.select_one('.post-image a')['href']
To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"))
soup = BeautifulSoup("<html>data</html>")
Upvotes: 2