Si Mon
Si Mon

Reputation: 163

Get text of children in a div with beautifulsoup

Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).

I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"

Anyone can help me?

Upvotes: 14

Views: 73297

Answers (3)

verwirrt
verwirrt

Reputation: 165

If wanting to extract text from all elements into a list, a list comprehension can come in handy:

texts = [r.text.strip() for r in results]

Upvotes: 1

Mowshon
Mowshon

Reputation: 999

Use decode_contents() method.

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

for res in result:
    print(res.decode_contents().strip())

You'll get the innerHTML from div.

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1124668

Use the .text attribute on the elements; you have a list of results, so loop:

for res in result:
    print(res.text)

.text is a property that proxies for the Element.get_text() method.

Alternatively, if there is only ever supposed to be one such <div>, use .find() instead of .find_all():

result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)

Upvotes: 39

Related Questions