xerxes01
xerxes01

Reputation: 125

Access Beautiful soup element in Nested HTML

I wish to extract the director & actor elements from this parsed html output of IMDB top 250 page. How should the python one liner for it look like? The "text-muted text-small" appears multiple times, and find_all does not seem to be the optimum way to go about it.

<span class="ipl-rating-selector__rating-value">0</span>
</div>
<div class="ipl-rating-selector__error ipl-rating-selector__wrapper">
<span>Error: please try again.</span>
</div>
</div>
<div class="ipl-rating-interactive__loader">
<img alt="loading" src="https://m.media-amazon.com/images/G/01/IMDb/spinning-progress.gif"/>
</div>
</div>
</div>
<div class="inline-block ratings-metascore">
<span class="metascore favorable">80        </span>
        Metascore
        </div>
<p class="">
    Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.</p>
<p class="text-muted text-small">
    Director:
<a href="/name/nm0001104/">Frank Darabont</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000209/">Tim Robbins</a>, 
<a href="/name/nm0000151/">Morgan Freeman</a>, 
<a href="/name/nm0348409/">Bob Gunton</a>, 
<a href="/name/nm0006669/">William Sadler</a>
</p>
<p class="text-muted text-small">
<span class="text-muted">Votes:</span>
<span data-value="2187696" name="nv">2,187,696</span>
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
<span data-value="28,341,469" name="nv">$28.34M</span>
</p>
<div class="wtw-option-standalone" data-baseref="wl_li" data-tconst="tt0111161" data-watchtype="minibar"></div>
</div>

Upvotes: 0

Views: 106

Answers (3)

mhdev
mhdev

Reputation: 166

This will select the containing p tag and iterate over it's children, printing out Directors and Actors separately:

director_and_stars_tag = soup.select_one('p:contains("Director:")')
directors_flag = True

for name_tag in director_and_stars_tag.findChildren():
    if directors_flag:
        # These are Director tags
        if ('span' in name_tag.name):
            directors_flag = False
        else:
            print('Director: %s' % name_tag.string)
    else:
        # These are Actor tags
        print('Actor: %s' % name_tag.string)

Output:

Director: Frank Darabont
Actor: Tim Robbins
Actor: Morgan Freeman
Actor: Bob Gunton
Actor: William Sadler

Upvotes: 1

Moshe perez
Moshe perez

Reputation: 1726

If you are using BeautifulSoup 4.7.0 or higher, you can use the :contains CSS selector:

soup = BeautifulSoup(your_html)
soup.select_one('p:contains("Director:","Stars:")')

Upvotes: 1

Kasem Alsharaa
Kasem Alsharaa

Reputation: 920

If there's no id or class that you can use to identify those specific elements, You can simply iterate through your items and check if they contain what you're looking for.
A working example on your html sample would be

details = soup.find_all("p", attrs={"class": "text-muted text-small"})
for element in details:
    if "Stars" in element.text:
        stars = element.find_all("a")
        for star in stars:
            print(star.text)

Upvotes: 0

Related Questions