Pro Girl
Pro Girl

Reputation: 952

How to extract an exact css selector

I'm trying to extract with Beautifulsoup and exact match css selector from a div.

I've already read the posts a link and post a link , but they don't solve my issue.

The divs which I want to extract are only the following:

<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aa batteries" data-nid="" data-reftag="nb_sb_ss_i_3_1" data-store="" data-type="a9" id="issDiv2"><span class="s-heavy"></span>a<span class="s-heavy">a batteries</span></div>

They must contain: data-alias="aps" and not just data-alias= (as there are many other divs with other attributes, like data-alias="gift-cards", etc. etc.

This is the code I've tried.

from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Chrome('chromedriver.exe')
mainUrl = "https://www.amazon.com/"
browser.get(mainUrl)
mainSoup = BeautifulSoup(browser.page_source, "html.parser")
searchInput = browser.find_element_by_xpath('//input[@id="twotabsearchtextbox"]')
searchInput.clear()
searchInput.send_keys('a')
time.sleep(2)
searchSoup = BeautifulSoup(browser.page_source, "html.parser")
searchResult = searchSoup.find_all('div', attrs={'id': 'suggestions-template'})
keys = searchSoup.select('div[data-alias]')
for key in keys:
    print(key)

This is the result that I get:

<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="amazon gift cards" data-nid="" data-reftag="nb_sb_ss_i_1_1" data-store="" data-type="a9" id="issDiv0"><span class="s-heavy"></span>a<span class="s-heavy">mazon gift cards</span></div>
<div class="s-suggestion" data-alias="gift-cards" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="amazon gift cards" data-nid="" data-reftag="nb_sb_ss_c_2_1" data-store="Gift Cards" data-type="a9-xcat" id="issDiv1"> <span class="a-size-mini" style="padding-left: 16pt">in <span class="a-color-tertiary">Gift Cards</span></span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aa batteries" data-nid="" data-reftag="nb_sb_ss_i_3_1" data-store="" data-type="a9" id="issDiv2"><span class="s-heavy"></span>a<span class="s-heavy">a batteries</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="aaa batteries" data-nid="" data-reftag="nb_sb_ss_i_4_1" data-store="" data-type="a9" id="issDiv3"><span class="s-heavy"></span>a<span class="s-heavy">aa batteries</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="airpod case" data-nid="" data-reftag="nb_sb_ss_i_5_1" data-store="" data-type="a9" id="issDiv4"><span class="s-heavy"></span>a<span class="s-heavy">irpod case</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch band 38mm" data-nid="" data-reftag="nb_sb_ss_i_6_1" data-store="" data-type="a9" id="issDiv5"><span class="s-heavy"></span>a<span class="s-heavy">pple watch band 38mm</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch" data-nid="" data-reftag="nb_sb_ss_i_7_1" data-store="" data-type="a9" id="issDiv6"><span class="s-heavy"></span>a<span class="s-heavy">pple watch</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="airpods" data-nid="" data-reftag="nb_sb_ss_i_8_1" data-store="" data-type="a9" id="issDiv7"><span class="s-heavy"></span>a<span class="s-heavy">irpods</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch band 42mm" data-nid="" data-reftag="nb_sb_ss_i_9_1" data-store="" data-type="a9" id="issDiv8"><span class="s-heavy"></span>a<span class="s-heavy">pple watch band 42mm</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="alexa" data-nid="" data-reftag="nb_sb_ss_i_10_1" data-store="" data-type="a9" id="issDiv9"><span class="s-heavy"></span>a<span class="s-heavy">lexa</span></div>
<div class="s-suggestion" data-alias="aps" data-crid="3LY5DQXGQLBAV" data-isfb="false" data-issc="false" data-keyword="apple watch charger" data-nid="" data-reftag="nb_sb_ss_i_11_1" data-store="" data-type="a9" id="issDiv10"><span class="s-heavy"></span>a<span class="s-heavy">pple watch charger</span></div>

I tried to replace as well the soup with:

keys = searchSoup.select('div[data-alias]="aps"')

but I get this error:

SyntaxError: Invlaid character '=' at position 15

How do I exclusively get the: data-alias="aps" divs? Thanks

Upvotes: 0

Views: 102

Answers (1)

Pro Girl
Pro Girl

Reputation: 952

problem solved, I was putting the quotation marks "" in the wrong position, this is the correct position:

keys = searchSoup.select('div[data-alias="aps"]')

Upvotes: 1

Related Questions