econnoob5
econnoob5

Reputation: 35

Python Selenium, check if <div ...> contains a word in web-scraping code

I am running a scraper using Selenium and BeautifulSoup and I want to check whether a certain word is in <div...>.

A snippet of the HTML code is as follows:

<div data-asin="0974158232" data-index="0" data-uuid="1f362f6b-dde2-4377-a5f3-518513486b7d" data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16-of-20 sg-col sg-col-12-of-16" data-component-id="14" data-cel-widget="search_result_0"><div class="sg-col-inner">
<div data-asin="" data-index="1" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_1">
<div data-asin="" data-index="2" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_2">

I would like, first of all, to check if div data-asin="" is empty or if there is a string as in data-asin="0974158232".

If it is empty I would like to enter in the <div...> and look for the data-asin. An example from div data-asin="" data-index="2" is:

> <div data-asin="" data-index="2" class="a-section a-spacing-none s-result-item s-flex-full-width s-border-bottom-none s-widget" data-cel-widget="search_result_2">
> <span cel_widget_id="MAIN-SEARCH_RESULTS-2" class="celwidget slot=MAIN template=SEARCH_RESULTS 
  widgetId=fkmr-search-results" data-csa-c-id="9so6vg-imque6-h59746-o5az71" data-cel-widget="MAIN- 
  SEARCH_RESULTS-2">
    > <div class="s-result-list sg-row">
       > <div class="s-result-item sg-col-16-of-20 sg-col sg-col-8-of-12 sg-col-12-of-16" data-cel- 
         widget="search_result_3">
       > <div data-asin="0974158216" data-index="0" data-uuid="99a1b582-2fcb-49b8-8d13-739783e460a5" 
         data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16- 
         of-20 sg-col sg-col-12-of-16" data-component-id="15" data-cel-widget="search_result_4"><div 
         class="sg-col-inner">
       > <div data-asin="1433692163" data-index="1" data-uuid="8f8bfb8c-6083-4c26-bdd5-3032bcfe4bed" 
         data-component-type="s-search-result" class="s-result-item s-asin sg-col-0-of-12 sg-col-16- 
         of-20 sg-col sg-col-12-of-16" data-component-id="16" data-cel-widget="search_result_5">

Here, I would like to tell the code to look for data-asin="" and check if it is an empty string or not. In this case it would not be empty because we have: <div data-asin="0974158216" and <div data-asin="1433692163"

I was thinking to use a for-loop or try/except, but I am very new with Selenium and HTML and I do not know how to approach this problem. Any kind of help would be deeply appreciated.

Upvotes: 0

Views: 658

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195468

To search <div> with non-empty data-asin="...", you can use this example:

import requests
from bs4 import BeautifulSoup


url = "https://www.amazon.com/s?k=A+Biblically+Based+Model+of+Cultural+Competence+in+the+Delivery+of+Healthcare+Services%3A+Seeing&ref=nb_sb_noss"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
    "Accept-Language": "en-US,en;q=0.5",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

# search only data-asin that have value, print it and the title
for div in soup.find_all("div", {"data-asin": bool}):
    print(div["data-asin"], div.select_one(".a-text-normal").text)

Prints:

0974158232 A Biblically Based Model of Cultural Competence in the Delivery of Healthcare Services: Seeing 
1433692163 Planting Missional Churches: Your Guide to Starting Churches that Multiply 
0310341728 Less Than Perfect: Broken Men and Women of the Bible and What We Can Learn from Them 
0800796853 God's Smuggler 
1885904088 The Excellent Wife: A Biblical Perspective 
B07K7YJPXD Hope Channel 
B07F1DNGMS Alistair Begg - Truth For Life 
B07DHZ6DL9 Star Trek Beyond (4K UHD) 
B0010ZONIY Heart of the Ukulele 

Upvotes: 1

Related Questions