Reputation: 3060
I am trying to get Product price from this website. https://www.cotswold-fayre.co.uk/ . Its required authentication to extract product price. we developed small python code to extract production price. In python we passed username and password and authentication was successful and received status code 200. After session is created , we tried to get product price we are getting 'Product Price: Login To Buy'.
import requests
from bs4 import BeautifulSoup
import pandas as pd
username = '[email protected]'
password = 'test@123'
num_pages = 10
# Create a session
session = requests.Session()
login_url = 'https://www.cotswold-fayre.co.uk/login'
login_data = {
'email': username,
'password': password
}
session.post(login_url, data=login_data)
# Initialize lists to store the extracted information
links = []
titles = []
model_numbers = []
barcodes = []
prices = []
# Iterate over each page
for page in range(1, num_pages + 1):
# Send a GET request to the website for each page
url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
response = session.get(url)
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all the product elements on the page
product_elements = soup.find_all('div', class_='product-item-info')
print(product_elements)
I noticed that after successful login , i am getting 302 http status from web UI. how to handle this situation
Upvotes: 0
Views: 49
Reputation: 2026
You don't need to handle login, if you take a look at the child link, it typically shows the product information in <meta>
tag, this side is also not stepping out that rule.
import requests
from bs4 import BeautifulSoup
num_pages = 2
# Iterate over each page
for page in range(1, num_pages + 1):
url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
links = soup.find_all('a', {'class': 'product-item-link'})
for item in links:
sub_response = requests.get(item['href'])
soup_item = BeautifulSoup(sub_response.content, 'html.parser')
metas = soup_item.find_all('meta')
print(item['href'])
for meta in metas:
print(meta)
https://www.cotswold-fayre.co.uk/soda-classic-lime-vodka-soda-hard-seltzer-5-abv-12-x-330ml.html
<meta charset="utf-8"/>
<meta content="&SODA - Classic Lime Vodka & Soda Hard Seltzer ABV 5% - 12 x 330ml" name="title"/>
<meta content="INDEX,FOLLOW" name="robots"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="product" property="og:type"/>
<meta content="&SODA - Classic Lime Vodka & Soda Hard Seltzer ABV 5% - 12 x 330ml" property="og:title"/>
<meta content="https://www.cotswold-fayre.co.uk/media/catalog/product/cache/05d29925af2e2ffcb865b18de40ea164/S/O/SOD01_1.jpg" property="og:image"/>
<meta content="&SODA - Classic Lime Vodka & Soda Hard Seltzer ABV 5% - 12 x 330ml" property="og:description"/>
<meta content="https://www.cotswold-fayre.co.uk/soda-classic-lime-vodka-soda-hard-seltzer-5-abv-12-x-330ml.html" property="og:url"/>
<meta content="19.4" property="product:price:amount"/>
<meta content="GBP" property="product:price:currency"/>
So the price is 19.4 GBP
Upvotes: 2