senaps
senaps

Reputation: 1525

scrape <span> tag text using BeautifulSoup has no text attribute

i have scraped a forum page, i have saved all the posts in a list called as post_list. but it seem's that i can't go any further and find the post author:

here is what i get in running command's without trying to find the text:

for post in post_list:
    print post.findAll("span" , {"itemprop" : "name"})

this give's me :

[<span class="hide" itemprop="name">00Amin</span>]
[<span class="hide" itemprop="name">arminheidari</span>]
[<span class="hide" itemprop="name">Zapad</span>]
[<span class="hide" itemprop="name">iMosi</span>]
[<span class="hide" itemprop="name">arminheidari</span>]
[<span class="hide" itemprop="name">alen</span>]
[<span class="hide" itemprop="name">mahdavi3d</span>]
[<span class="hide" itemprop="name">arminheidari</span>]
[<span class="hide" itemprop="name">alen</span>]
[<span class="hide" itemprop="name">rezatizi</span>]
[<span class="hide" itemprop="name">Trooper</span>]
[<span class="hide" itemprop="name">rasoolmr</span>]
[<span class="hide" itemprop="name">arminheidari</span>]
[<span class="hide" itemprop="name">iMosi</span>]
[<span class="hide" itemprop="name">anybody</span>]

but, if i try the same code with a .text:

for post in post_list:
    print post.findAll("span" , {"itemprop" : "name"}).text

i get :

AttributeError: 'ResultSet' object has no attribute 'text'

if i cheat and save the for loop result in a variable(or a list) and then try the get the text from there, i fail again!

posts = []
for post in post_list:
     posts.append(post.findAll("span",  {"itemprop" : "name"}))

i get no error but i cant find any .text property again

i have searched and tested some other question's i have find, but they don't work.

Upvotes: 2

Views: 1874

Answers (1)

har07
har07

Reputation: 89285

As the error message clearly suggests, that's because findAll() returns ResultSet which doesn't have attribute text. You need to iterate through the result, or using list comprehension :

for post in post_list:
    print [span.text for span in post.findAll("span" , {"itemprop" : "name"})]

If there is always only one span element in each post (judging from the output of your first code snippet), then you should be able to use find() instead of findAll() :

for post in post_list:
    print post.find("span" , {"itemprop" : "name"}).text

Upvotes: 3

Related Questions