Viswa
Viswa

Reputation: 43

Scraping Prices in AUD from USD in Python Selenium - Web Scraping

I'm new in Python Selenium webdriver. I'm writing a script to scrape the prices in this site https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282. But I want to scrape the prices in AUD from USD. You will see a dropdown list (Currency) in the footer of this site. This dropdown list will show you a list of currencies. After clicking this dropdown, select currency as Australian Dollar. Immediately the site refreshes and shows prices in both USD and AUD. I need to scrape the prices only in AUD not in USD. How should I do that?

Here is my code:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282")

price = driver.find_elements_by_xpath('//span[@data-selenium = "uppedDecimalPriceFirst"]')

for i in range(0, len(price)):

    print("Prices in USD : " + price[i].text)

driver.close()

Output:

Price : $3,899

Expected Output:

Price : $5462.50

Currency List

Prices in AUD after selecting the dropdown as AUD

Upvotes: 4

Views: 334

Answers (2)

baduker
baduker

Reputation: 20052

Here's bit different approach, not using selenium.

The currency code is passed as one of the cookie values. So, you can grab all cookies, parse them, and send the request.

The down side of this is that this gets expired pretty quickly.

from http.cookies import SimpleCookie
import requests
from bs4 import BeautifulSoup


headers = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
    "cache-control": "max-age=0",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "same-origin",
    "referer": "https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36"
}


url = "https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282"

rawdata = """
__cfduid=d3bdb54963e368d44b5e6884f7d73e51c1602143416; dcid=1602143417279-81384671; mapp=0; cartId=22850067748; locale=en; aperture-be-commit-id=n/a; D_SID=62.96.159.233:3ohrnHfDBx/FJKw5rqCYwC+IsePPxHUI0sfORzmYLhg; aperture-be-commit-id=n/a; ftr_ncd=6; __cfruid=be69be948af2b86ea99a0bb76bfe96f1f2e949cf-1602145612; BHFOTO_ENSIGHTEN_PRIVACY_BANNER_VIEWED=1; _fbp=fb.1.1602146128792.1914192519; SSID_C=CABIZh0AAAAAAAC5xH5fXRRAGLnEfl8DAAAAAAAAAAAA4th-XwANyA; SSSC_C=333.G6881153579923543133.3|0.0; my_cookie=!qltlzDwS7kQ/zfYbOLfqaMDV2Dx0K84+1Np3voLzYtZJpbdmGM1aRtff/nmQJ3K2+8RhwqtQU7mqko0njfvupkBbTQcSqyQsQqzSit5MsDRmQAltfZQJhGdGL7+7DP37WbBdLTWyfrIPcBHn72OdD66SAh/iiPaOqzyT2wI6ifXP0la2fofeIPxkkD5gcQkibDJFDdy6FQ==; build=20200925v10-20200925v10; TS0188dba5_77=080f850069ab28005f16d063631902e3794db1c0edc9f4c6f6ef0d1eac6527cf99ad1b8c27823c9b69571149d6b96f090814e6aa558240000b1940db61379359c992a1feb444e1125e0df1fe6100edbe2d3b20d0751edcc1b64b1e00a346fdc9f9abcfadd468699787be03341403bd5353fdb3a2e5d8deb9; cookieID=221356360871602130569547; sessionKey=1e8aaddd-62ef-4dee-82c8-6f35fcc57a71; utkn=2870955d449ed2798d6d41e4b34cb3d7; SSRT_C=JeR-XwQBAA; SSPV_C=NOgAAAAAAAAAAAAAAAAAAAAAAA4AAAAAAAAAAAAA; pvid=1602151461750-67341336; TS01c1e793=01ec39615f9aa795b03ad73db5ae3ea9bfaccd846205e248eefb69ecbb9655964dbd97008d269c15dc4da0f79c2a904db5fbb4141ad60dd5e19c1cc1945cac3435d54cc822813399ba0563719ef2813d1e35f54430; TS01472329=01ec39615fa04d4ed3ebebb173c30f0c52eb5a398a02c6c93717c12f32ed8f6055a8528125ec9a7ae65ddb2b93d1ec5f99cb9ccd51141361e57c4a0c54bd1b9ee137e996cb; __cf_bm=78744b2be0c5790321c0fca10c6e9f49811df789-1602151463-1800-AePVD5wqk3ezKZPdl1gG2H/9EkXrrYhfPhrO21Gd/A1AXsfyYZ1fAzpSulkj6Xjvm6uigtGKCUxaJwuohJ0a96nrAfe18dExkTncLlDc/8JQFjcTyDM12xKX5H2bdGrE2Q==; D_IID=80EBCC02-1B08-3947-9BBA-B7DE344C0CC2; D_UID=57631A10-AD80-379F-B193-E3144BE816A6; D_ZID=3E416AAF-00A8-38F3-8951-18E587B713F3; D_ZUID=DD74D4B2-8918-3C31-86CE-559CD726FA66; D_HID=10B0A111-9447-3A05-9D56-B60C9EAF8E57; uui=800.952.1815|; TS0119d048=01ec39615fa9a4fe4090a627d643ced60a42832e7505e248eefb69ecbb9655964dbd97008dfb49159bf4a23094ef84f4adea4642e74d190625c36f2dea052a14c1a672d5414f4d44f6abe7dece2da3d4b346965a974f110d88d050837c31a114c3cab7eb945e08f785ed3175f6370325edc449813d420357a41783aacbf1ed6284c67427c4b9e6beea94f05f3e544e7d2e8477c0e326fa6bca1d24896f750eb456e07d71135edc2e20bbb3214751b624bf57cf6a7bfc2de1a5012e41a12bb765b0518572b5; JSESSIONID=P28Hq0Itv8QxVhGCV0uNtM3VtH_upfSq!230185629; TS0188dba5=01ec39615f507d745bcb1186593a0c4b69a62e6cafecbc31323cbb6cc681bcdb22e0b477baa125ba7d0a82f94f7c3d8be8a5accb5725a00585f2206c375d2f0fcc1b23e501f4ae68f76fbdd16dc18929fd78ad8c360ea424f8b14a1a3a19c4ebcbf0d33a56; forterToken=4d9bbef1ef224c25a2d18be549978618_1602151464974__UDF43_6; TopBarCart=0|0; TS01e1f1fd=01ec39615f2d39ed2dad52a73051682081c331a6fc855ff8177aa2b38dd47ee53287ef19ce9149fa03daf9e245f740107ae7418eac935eda28bf46197e4c0912c7dec3d9e9809839dfcb4f3ba2b8578ef3aae5f6757093319d1016d4672421656801801d7ea0c51b9fee103e26b11616c645918809698bacf0f9d2ff7c417c991064c74acf03fb9615419622d2f45342df9551e738; TS01d628c4=01ec39615f10de3e4775311fe5669af070e90c5813855ff8177aa2b38dd47ee53287ef19ce49233bd9f6eaae68737da6b926ec55365d9ba728190cac27d9b2935672943d9703024c3347affe42969f5767337f2aaeb26763a2924b02d56124b0fde952683157cf258c2ee5ac1e8b34d61653354138d80dfcfdec9c05278f413d99b988b82b598ac9cbf9d13cb074d64f3964183a340e1eec613eedff8014938db974924ef90a531f09bebeffe76b18ef64d0544f71236390b7f94621442a9a77bb33f8710dcdfb8456c5ddc25c72ec9c9bb5d9bfcd; app_cookie=1602151489; lpi=cat=2,cur=AUD,app=D,lang=E,view=L,lgdin=N,cache=release-WEB-20200924v10-BHJ-DVB25488-3,ipp=24,view=L,sort=BS,priv=Y,state=NY; dpi=cat=2,cur=AUD,app=D,lang=E,view=L,lgdin=N,cache=release-WEB-20200924v10-BHJ-DVB25488-3"""
cookie = SimpleCookie()
cookie.load(rawdata)

cookies = {}
for key, morsel in cookie.items():
    cookies[key] = morsel.value

response = requests.get(url, headers=headers, cookies=cookies)
soup = BeautifulSoup(response.content, "html.parser")


p_name = soup.find_all("span", {"data-selenium": "miniProductPageProductName"})
p_conv = soup.find_all("div", {"data-selenium": "miniProductPageProductConversion"})
f_currency = soup.find_all("div", {"data-selenium": "miniProductPagePricingForeignCurrency"})

product_name = [t.getText() for t in p_name]
price = [t.find("span").getText() for t in p_conv]
foreign_price = [t.getText() for t in f_currency]

for n, u, a, in zip(product_name, price, foreign_price):
    print(f"{n} - {u} - {a}")

Sample output:

Canon EOS R5 Mirrorless Digital Camera (Body Only) - $3,89900 - AUD $5,462.50
Sony Alpha a7S III Mirrorless Digital Camera (Body Only) - $3,49800 - AUD $4,900.70
Canon EOS 5D Mark IV DSLR Camera Body with Accessory Kit - $2,49900 - AUD $3,501.10
Canon EOS R6 Mirrorless Digital Camera (Body Only) - $2,49900 - AUD $3,501.10
Sony Alpha a7 III Mirrorless Digital Camera Body with Accessory Kit - $1,99800 - AUD $2,799.20
Sony ZV-1 Digital Camera - $79800 - AUD $1,118.00
Canon EOS R6 Mirrorless Digital Camera with 24-105mm f/4-7.1 Lens - $2,79900 - AUD $3,921.40

To get the raw cookie string, copy the cookie value from the request Header after setting up the currency.

enter image description here

Upvotes: 2

Ice Bear
Ice Bear

Reputation: 4086

What you can do is use the .send_keys() function. For example in this line of code here. I believe you should get the <select> tag.

Probably a similar answer here.

#put the id value there of the dropdown tag, or find it by class if it has a class only, You can do it by xpath.. anyways just get the element :)
dropdown_currency = browser.find_element_by_id("ID-value")


dropdown_currency .send_keys("Australian Dollar")

Do this before you start your scraping process...

UPDATE!

I did not know that it doesn't use a <select> tag so try this. Since the value of the dropdown depends on the p tag (go check it out), a way to do this is to set the innerHTML of that p tag to what you want, in this case "Australian Dollar". Take note you still won't be scraping the Australian Dollars price cause it's on a different tag.

But you can still do it by yourself since the prices of AUD are shown now, you can do that EXACTLY what you did to get the USD. You just need to look out for its elements or tags again

I also suggest you try implementing those what I commented out, it's a good practice especially with selenium, it waits for elements to load up also is a good way to avoid errors with getting elements. Btw 20 there is for to wait 20 seconds.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282")

#try:
#   items_container = WebDriverWait(my_browser,20).until(
#       EC.presence_of_element_located((By.ID,"featured"))
#       )

div_dropdown = driver.find_element_by_class_name("currencySelect")
p_value = div_dropdown.find_element_by_tag_name("a")
driver.execute_script("arguments[0].innerHTML = 'Australian Dollar';",p_value)



price = driver.find_elements_by_xpath('//span[@data-selenium = "uppedDecimalPriceFirst"]')

for i in range(0, len(price)):

    print("Prices in USD : " + price[i].text)

#driver.close()

With WebDriverWait

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.get("https://www.bhphotovideo.com/c/buy/Digital-Cameras/ci/9811/N/4288586282")

try:
    '''
    This is the class value of the block of code where it contains the 
    listings to be scraped
    '''
    products_block_class = "listings_17u0Luvydb9lNARSsSl8xJ"
    
    items_container = WebDriverWait(driver,20).until(
        EC.presence_of_element_located((By.CLASS_NAME,products_block_class))
        )


    div_dropdown = driver.find_element_by_class_name("currencySelect")
    p_value = div_dropdown.find_element_by_tag_name("a")
    driver.execute_script("arguments[0].innerHTML = 'Australian Dollar';",p_value)



    price = driver.find_elements_by_xpath('//span[@data-selenium = "uppedDecimalPriceFirst"]')

    for i in range(0, len(price)):

        print("Prices in USD : " + price[i].text)

    #driver.close()
except BaseException as error:
    print("What happened? ",error)

---OUTPUT SAMPLE----

aud

Upvotes: 0

Related Questions