Reputation: 229
Here is the URL for what I am working with.
I am trying to get all the values under Username
from this html using soup.find()
. I am not sure how to refer to this div
as the last div
I find with an is is soup.find("div", {"id": "sort-by"}).contents
which returns:
['\n',
<div id="sort-by-container">
<div id="sort-by-current"><i aria-hidden="true" class="fa fa-sort"></i> <span id="sort-by-current-title">Sorted by: Followers</span></div>
<div class="border-box no-select" id="sort-by-dropdown">
<div class="sort-by-select" data-sort="most-followers" data-title="Sorted by: Followers">Sort by Followers</div>
<div class="sort-by-select" data-sort="most-following" data-title="Sorted by: Following">Sort by Following</div>
<div class="sort-by-select" data-sort="most-uploads" data-title="Sorted by: Uploads">Sort by Uploads</div>
<div class="sort-by-select" data-sort="most-likes" data-title="Sorted by: Likes">Sort by Likes</div>
</div>
</div>,
'\n',
<div style="clear: both;"></div>]
Ultimately, I am trying to get each row under username charli d’amelio
, addison rae
or the content of `<a href"">
This is the full code I have tied so far:
from bs4 import BeautifulSoup
with open('Top 50 TikTok users sorted by Followers - Socialblade TikTok Stats _ TikTok Statistics.html') as file:
soup = BeautifulSoup(file)
soup.find('title').contents
soup.find("div", {"id": "sort-by"}).contents
Upvotes: 1
Views: 308
Reputation: 20058
To find all the names under the "Username" column, you can use the :nth-of-type(n)
CSS Selector: div div:nth-of-type(n+5) > div > a
.
To use a CSS Selector, use the .select()
method instead of .find_all()
.
In your example:
from bs4 import BeautifulSoup
with open("file.html", "r", encoding="utf-8") as file:
soup = BeautifulSoup(str(file.readlines()), "html.parser")
for tag in soup.select("div div:nth-of-type(n+5) > div > a"):
print(tag.text)
Output:
charli d’amelio
addison rae
Bella Poarch
Zach King
TikTok
...
Upvotes: 2