sophiaw
sophiaw

Reputation: 313

Iterating in Python and BeautifulSoup

soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print i

I'll take only this part of the code because that's where I'm getting stuck in.

Soup returns a list, I tried to use ' '.join() to have a literal string and it didn't work, because it's expected a string, not a tag. I guess it's sort of bug.

Iterating, it prints on screen all the list without comma.

But what I want is to get a href content inside div cass="thread"

I tried many things like

soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print BeautifulSoup(i)('a')['href']

The last code gives me 'NoneType' object is not callabe.

I'm trying a lot of combinations but I am indeed stuck in, I can't have it working at all. I don't know what to do after many failed try-outs. It's frustrating.

Upvotes: 1

Views: 4408

Answers (2)

methyl
methyl

Reputation: 3312

It should be something like

divs = BeautifulSoup(html).findAll('div','thread')  
for div in divs:  
    print div.find('a').attr['href'] # may it be map(a.attrs)['href'], I don't remember now

Upvotes: 2

hellatan
hellatan

Reputation: 3577

taking a look at the documentation for this module/class (http://www.crummy.com/software/BeautifulSoup/documentation.html) - the second argument for findAll is a json object, not a string. have you tried this instead:

BeautifulSoup(html).findAll('div', { 'class': 'thread' })

Upvotes: 1

Related Questions