Archeofuturist
Archeofuturist

Reputation: 215

Python list object has no attribute error

I am new to Python and I am trying to write a website scraper to get links from subreddits, which I can then pass to another class later on for automatic download of images from imagur.

In this code snippet, I am just trying to read the subreddit and scrape any imagur htmls from hrefs, but I get the following error:

AttributeError: 'list' object has no attribute 'timeout'

Any idea as to why this might be happening? Here is the code:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import sys
from urlparse import urljoin

def get_category_links(base_url):
    url = base_url
    html = urlopen(url)
    soup = BeautifulSoup(html)
    posts = soup('a',{'class':'title may-blank loggedin outbound'})
    #get the links with the class "title may-blank "
    #which is how reddit defines posts
    for post in posts:
        print post.contents[0]
        #print the post's title

        if post['href'][:4] =='http':
            print post['href']
        else:
            print urljoin(url,post['href'])
        #print the url.  
        #if the url is a relative url,
        #print the absolute url.   


get_category_links(sys.argv)

Upvotes: 1

Views: 3702

Answers (1)

alecxe
alecxe

Reputation: 474191

Look at how you call the function:

get_category_links(sys.argv)

sys.argv here is a list of script arguments where the first item is the script name itself. This means that your base_url argument value is a list which leads to failing urlopen:

>>> from urllib2 import urlopen
>>> urlopen(["I am", "a list"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
           │           │    │     └ <object object at 0x105e2c120>
           │           │    └ None
           │           └ ['I am', 'a list']
           └ <urllib2.OpenerDirector instance at 0x105edc638>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in open
    req.timeout = timeout
    │             └ <object object at 0x105e2c120>
    └ ['I am', 'a list']
AttributeError: 'list' object has no attribute 'timeout'

You meant to get the second argument from sys.argv and pass it to get_category_links:

get_category_links(sys.argv[1])

It's interesting though, how cryptic and difficult to understand the error in this case is. This is coming from the way the "url opener" works in Python 2.7. If, the url value (the first argument) is not a string, it assumes it is a Request instance and tries to set a timeout value on it:

def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
    # accept a URL or a Request object
    if isinstance(fullurl, basestring):
        req = Request(fullurl, data)
    else:
        req = fullurl
        if data is not None:
            req.add_data(data)

    req.timeout = timeout  # <-- FAILS HERE

Note that the behavior have not actually changed in the latest stable 3.6 as well.

Upvotes: 4

Related Questions