Reputation: 35
I am running a scraper using Selenium and BeautifulSoup and I want to check whether a certain word is in <div...>.
A snippet of the HTML code is as follows:
<div data-asin="0974158232" data-index="0" data-uuid="1f362f6b-dde2-4377-a5f3-518513486b7d" data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16-of-20 sg-col sg-col-12-of-16" data-component-id="14" data-cel-widget="search_result_0"><div class="sg-col-inner">
<div data-asin="" data-index="1" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_1">
<div data-asin="" data-index="2" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_2">
I would like, first of all, to check if div data-asin=""
is empty or if there is a string as in data-asin="0974158232"
.
If it is empty I would like to enter in the <div...> and look for the data-asin
. An example from div data-asin="" data-index="2"
is:
> <div data-asin="" data-index="2" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_2">
> <span cel_widget_id="MAIN-SEARCH_RESULTS-2" class="celwidget slot=MAIN template=SEARCH_RESULTS
widgetId=fkmr-search-results" data-csa-c-id="9so6vg-imque6-h59746-o5az71" data-cel-widget="MAIN-
SEARCH_RESULTS-2">
> <div class="s-result-list sg-row">
> <div class="s-result-item sg-col-16-of-20 sg-col sg-col-8-of-12 sg-col-12-of-16" data-cel-
widget="search_result_3">
> <div data-asin="0974158216" data-index="0" data-uuid="99a1b582-2fcb-49b8-8d13-739783e460a5"
data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16-
of-20 sg-col sg-col-12-of-16" data-component-id="15" data-cel-widget="search_result_4"><div
class="sg-col-inner">
> <div data-asin="1433692163" data-index="1" data-uuid="8f8bfb8c-6083-4c26-bdd5-3032bcfe4bed"
data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16-
of-20 sg-col sg-col-12-of-16" data-component-id="16" data-cel-widget="search_result_5">
Here, I would like to tell the code to look for data-asin=""
and check if it is an empty string or not. In this case it would not be empty because we have: <div data-asin="0974158216"
and <div data-asin="1433692163"
I was thinking to use a for-loop or try/except, but I am very new with Selenium and HTML and I do not know how to approach this problem. Any kind of help would be deeply appreciated.
Upvotes: 0
Views: 658
Reputation: 195468
To search <div>
with non-empty data-asin="..."
, you can use this example:
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/s?k=A+Biblically+Based+Model+of+Cultural+Competence+in+the+Delivery+of+Healthcare+Services%3A+Seeing&ref=nb_sb_noss"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept-Language": "en-US,en;q=0.5",
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
# search only data-asin that have value, print it and the title
for div in soup.find_all("div", {"data-asin": bool}):
print(div["data-asin"], div.select_one(".a-text-normal").text)
Prints:
0974158232 A Biblically Based Model of Cultural Competence in the Delivery of Healthcare Services: Seeing
1433692163 Planting Missional Churches: Your Guide to Starting Churches that Multiply
0310341728 Less Than Perfect: Broken Men and Women of the Bible and What We Can Learn from Them
0800796853 God's Smuggler
1885904088 The Excellent Wife: A Biblical Perspective
B07K7YJPXD Hope Channel
B07F1DNGMS Alistair Begg - Truth For Life
B07DHZ6DL9 Star Trek Beyond (4K UHD)
B0010ZONIY Heart of the Ukulele
Upvotes: 1