Nazanin Zinouri
Nazanin Zinouri

Reputation: 229

Understanding how to use beautifulsoup find() to extract all elements in an html in a particular div

Here is the URL for what I am working with.

Sample html

I am trying to get all the values under Username from this html using soup.find(). I am not sure how to refer to this div as the last div I find with an is is soup.find("div", {"id": "sort-by"}).contents which returns:

['\n',
 <div id="sort-by-container">
 <div id="sort-by-current"><i aria-hidden="true" class="fa fa-sort"></i> <span id="sort-by-current-title">Sorted by: Followers</span></div>
 <div class="border-box no-select" id="sort-by-dropdown">
 <div class="sort-by-select" data-sort="most-followers" data-title="Sorted by: Followers">Sort by Followers</div>
 <div class="sort-by-select" data-sort="most-following" data-title="Sorted by: Following">Sort by Following</div>
 <div class="sort-by-select" data-sort="most-uploads" data-title="Sorted by: Uploads">Sort by Uploads</div>
 <div class="sort-by-select" data-sort="most-likes" data-title="Sorted by: Likes">Sort by Likes</div>
 </div>
 </div>,
 '\n',
 <div style="clear: both;"></div>]

Ultimately, I am trying to get each row under username charli d’amelio, addison rae or the content of `<a href""> enter image description here

This is the full code I have tied so far:

from bs4 import BeautifulSoup
with open('Top 50 TikTok users sorted by Followers - Socialblade TikTok Stats _ TikTok Statistics.html') as file:
    soup = BeautifulSoup(file)
soup.find('title').contents
soup.find("div", {"id": "sort-by"}).contents

Upvotes: 1

Views: 308

Answers (1)

MendelG
MendelG

Reputation: 20058

To find all the names under the "Username" column, you can use the :nth-of-type(n) CSS Selector: div div:nth-of-type(n+5) > div > a.

To use a CSS Selector, use the .select() method instead of .find_all().

In your example:

from bs4 import BeautifulSoup

with open("file.html", "r", encoding="utf-8") as file:
    soup = BeautifulSoup(str(file.readlines()), "html.parser")

for tag in soup.select("div div:nth-of-type(n+5) > div > a"):
    print(tag.text)

Output:

charli d’amelio
addison rae
Bella Poarch
Zach King
TikTok
...

Upvotes: 2

Related Questions