user294015
user294015

Reputation: 65

BeautifulSoup get conten after next_silbling

I try to get answer fron this part:

    <div class="beschreibung">
<!-- Jahr -->
<strong class="main">Jahr:</strong>
2008<br/>
<!-- Jahr Ende -->
<!-- Genre -->
<strong class="main">Genre:</strong>
Action | Krimi | Drama<br/>
<!-- Genre Ende -->
<!-- Sprache -->
<strong class="main">Sprache:</strong>
Deutsch DTS-HD | Englisch DTS-HD<br/>
<!-- Sprache Ende -->
<!-- Länge -->
<strong class="main">Laufzeit:</strong>
90 Minuten<br/>
<!-- Länge Ende -->

so I tried as follow:

for details in soup_.find_all("div", {"class" : "beschreibung"}):
    info = {details.text.rstrip(':'): details.next_sibling.strip() for s in details.find_all("strong")}
    print (repr(info))

I get the respose:

{u"\n\nJahr: \r\n2010\n\n\nGenre: \r\nThriller | Mystery\n\n\nSprache: \r\nDeutsch DTS\n\n\nLaufzeit: \r\n76 Minuten\n\n": u''}

but how can I get now the right content for "Jahr", "Sprache" etc...

Upvotes: 1

Views: 58

Answers (2)

宏杰李
宏杰李

Reputation: 12158

from bs4 import BeautifulSoup
content = """
<div class="beschreibung">
<!-- Jahr -->
<strong class="main">Jahr:</strong>
2008<br/>
<!-- Jahr Ende -->
<!-- Genre -->
<strong class="main">Genre:</strong>
Action | Krimi | Drama<br/>
<!-- Genre Ende -->
<!-- Sprache -->
<strong class="main">Sprache:</strong>
Deutsch DTS-HD | Englisch DTS-HD<br/>
<!-- Sprache Ende -->
<!-- Lange -->
<strong class="main">Laufzeit:</strong>
90 Minuten<br/>
<!-- Lange Ende -->
</div>
"""
soup = BeautifulSoup(content, "lxml") 

{i.text.rstrip(':'):i.next_sibling.strip() for i in soup.find_all('strong')}

out_put:

{'Genre': 'Action | Krimi | Drama',
 'Jahr': '2008',
 'Laufzeit': '90 Minuten',
 'Sprache': 'Deutsch DTS-HD | Englisch DTS-HD'}

Upvotes: 0

Rustem K
Rustem K

Reputation: 1242

Do you mean something like this:

from bs4 import BeautifulSoup
content = """
<div class="beschreibung">
<!-- Jahr -->
<strong class="main">Jahr:</strong>
2008<br/>
<!-- Jahr Ende -->
<!-- Genre -->
<strong class="main">Genre:</strong>
Action | Krimi | Drama<br/>
<!-- Genre Ende -->
<!-- Sprache -->
<strong class="main">Sprache:</strong>
Deutsch DTS-HD | Englisch DTS-HD<br/>
<!-- Sprache Ende -->
<!-- Lange -->
<strong class="main">Laufzeit:</strong>
90 Minuten<br/>
<!-- Lange Ende -->
</div>
"""
soup = BeautifulSoup(content, "html.parser") 
info = {}
for details in soup.find_all("div", {"class" : "beschreibung"}):
    for strong in details.find_all('strong'):
        info[strong.text.strip(':')] = strong.next_sibling.strip('\n')
    print info

This code will result in the following output:

{u'Genre': u'Action | Krimi | Drama', u'Jahr': u'2008', u'Laufzeit': u'90 Minuten', u'Sprache': u'Deut│
sch DTS-HD | Englisch DTS-HD'}

Upvotes: 2

Related Questions