Learn Hadoop
Learn Hadoop

Reputation: 3060

Python web data extraction using BeautifulSoup module

I am trying to get Product price from this website. https://www.cotswold-fayre.co.uk/ . Its required authentication to extract product price. we developed small python code to extract production price. In python we passed username and password and authentication was successful and received status code 200. After session is created , we tried to get product price we are getting 'Product Price: Login To Buy'.

import requests
from bs4 import BeautifulSoup
import pandas as pd


username = '[email protected]'
password = 'test@123'


num_pages = 10

# Create a session
session = requests.Session()


login_url = 'https://www.cotswold-fayre.co.uk/login'
login_data = {
    'email': username,
    'password': password
}
session.post(login_url, data=login_data)

# Initialize lists to store the extracted information
links = []
titles = []
model_numbers = []
barcodes = []
prices = []

# Iterate over each page
for page in range(1, num_pages + 1):
    # Send a GET request to the website for each page
    url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
    response = session.get(url)

    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')
    # Find all the product elements on the page
    product_elements = soup.find_all('div', class_='product-item-info')
    print(product_elements)
    

I noticed that after successful login , i am getting 302 http status from web UI. how to handle this situation

Upvotes: 0

Views: 49

Answers (1)

Alfred Luu
Alfred Luu

Reputation: 2026

You don't need to handle login, if you take a look at the child link, it typically shows the product information in <meta> tag, this side is also not stepping out that rule.

import requests
from bs4 import BeautifulSoup

num_pages = 2

# Iterate over each page
for page in range(1, num_pages + 1):
    url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')
    links = soup.find_all('a', {'class': 'product-item-link'})
    for item in links:
        sub_response = requests.get(item['href'])
        soup_item = BeautifulSoup(sub_response.content, 'html.parser')
        metas = soup_item.find_all('meta')
        print(item['href'])
        for meta in metas:
            print(meta)
    
https://www.cotswold-fayre.co.uk/soda-classic-lime-vodka-soda-hard-seltzer-5-abv-12-x-330ml.html
<meta charset="utf-8"/>
<meta content="&amp;SODA - Classic Lime Vodka &amp; Soda Hard Seltzer ABV 5% - 12 x 330ml" name="title"/>
<meta content="INDEX,FOLLOW" name="robots"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="product" property="og:type"/>
<meta content="&amp;SODA - Classic Lime Vodka &amp; Soda Hard Seltzer ABV 5% - 12 x 330ml" property="og:title"/>
<meta content="https://www.cotswold-fayre.co.uk/media/catalog/product/cache/05d29925af2e2ffcb865b18de40ea164/S/O/SOD01_1.jpg" property="og:image"/>
<meta content="&amp;SODA - Classic Lime Vodka &amp; Soda Hard Seltzer ABV 5% - 12 x 330ml" property="og:description"/>
<meta content="https://www.cotswold-fayre.co.uk/soda-classic-lime-vodka-soda-hard-seltzer-5-abv-12-x-330ml.html" property="og:url"/>
<meta content="19.4" property="product:price:amount"/>
<meta content="GBP" property="product:price:currency"/>

So the price is 19.4 GBP

Upvotes: 2

Related Questions