no nein
no nein

Reputation: 711

Getting URL of a picture on imgur

I am trying to automate a process of downloading imgur files, and for this purpose I am using beautifulsoup to get the link however to be honest I am pretty lost on why this doesn't work, as according to my research it should:

    soup = BeautifulSoup("http://imgur.com/ha0WYYQ")
    imageUrl = soup.select('.image a')[0]['href']

The code above just returns an empty list, and therefore an error. I tried to modify it, but to no avail. Any and all input is appreciated.

Upvotes: 0

Views: 2013

Answers (2)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477180

There are a few things wrong with your approach:

  • BeautifulSoup does not expect an url, so you will need to use a library to fetch the HTML stream first; and
  • Your selector seems invalid based on what I can see it should be .post-image a.
r = urllib.urlopen('http://imgur.com/ha0WYYQ').read()
soup = BeautifulSoup(r,'lxml')
soup.select('.post-image a')[0]['href']

Or more elegant:

with urllib.urlopen('http://imgur.com/ha0WYYQ') as f:
    r = f.read()
    soup = BeautifulSoup(r,'lxml')
    result = soup.select('.post-image a')[0]['href']

Upvotes: 3

宏杰李
宏杰李

Reputation: 12168

<div class="post-image">


                        <a href="//i.imgur.com/ha0WYYQ.jpg" class="zoom">
                                    <img src="//i.imgur.com/ha0WYYQ.jpg" alt="Frank in his bb8 costume" itemprop="contentURL">

            </a>


</div>

this is the image tag, the "post-image" is a single word, can not be separated.

imageUrl = soup.select('.post-image a')[0]['href']

shortcut for select one tag:

imageUrl = soup.select_one('.post-image a')['href']

To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("index.html"))

soup = BeautifulSoup("<html>data</html>")

Upvotes: 2

Related Questions