Python web data extraction using BeautifulSoup module

Question

I am trying to get Product price from this website. https://www.cotswold-fayre.co.uk/ . Its required authentication to extract product price. we developed small python code to extract production price. In python we passed username and password and authentication was successful and received status code 200. After session is created , we tried to get product price we are getting 'Product Price: Login To Buy'.

import requests
from bs4 import BeautifulSoup
import pandas as pd


username = 'testuser@gmail.com'
password = 'test@123'


num_pages = 10

# Create a session
session = requests.Session()


login_url = 'https://www.cotswold-fayre.co.uk/login'
login_data = {
    'email': username,
    'password': password
}
session.post(login_url, data=login_data)

# Initialize lists to store the extracted information
links = []
titles = []
model_numbers = []
barcodes = []
prices = []

# Iterate over each page
for page in range(1, num_pages + 1):
    # Send a GET request to the website for each page
    url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
    response = session.get(url)

    # Create a BeautifulSoup object to parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')
    # Find all the product elements on the page
    product_elements = soup.find_all('div', class_='product-item-info')
    print(product_elements)

I noticed that after successful login , i am getting 302 http status from web UI. how to handle this situation

Alfred Luu · Accepted Answer

You don't need to handle login, if you take a look at the child link, it typically shows the product information in tag, this side is also not stepping out that rule.

import requests
from bs4 import BeautifulSoup

num_pages = 2

# Iterate over each page
for page in range(1, num_pages + 1):
    url = f'https://www.cotswold-fayre.co.uk/products/?page={page}'
    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')
    links = soup.find_all('a', {'class': 'product-item-link'})
    for item in links:
        sub_response = requests.get(item['href'])
        soup_item = BeautifulSoup(sub_response.content, 'html.parser')
        metas = soup_item.find_all('meta')
        print(item['href'])
        for meta in metas:
            print(meta)

https://www.cotswold-fayre.co.uk/soda-classic-lime-vodka-soda-hard-seltzer-5-abv-12-x-330ml.html

So the price is 19.4 GBP

Python web data extraction using BeautifulSoup module

Answers (1)

Related Questions