Reputation: 163
Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})
With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).
I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"
Anyone can help me?
Upvotes: 14
Views: 73297
Reputation: 165
If wanting to extract text from all elements into a list, a list comprehension can come in handy:
texts = [r.text.strip() for r in results]
Upvotes: 1
Reputation: 999
Use decode_contents()
method.
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})
for res in result:
print(res.decode_contents().strip())
You'll get the innerHTML from div.
Upvotes: 2
Reputation: 1124668
Use the .text
attribute on the elements; you have a list of results, so loop:
for res in result:
print(res.text)
.text
is a property that proxies for the Element.get_text()
method.
Alternatively, if there is only ever supposed to be one such <div>
, use .find()
instead of .find_all()
:
result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)
Upvotes: 39