Vince Hadi
Vince Hadi

Reputation: 3

How to find the tags 'a' for scraping the data?

I need to scrape the data from this website https://shop.freedompop.com/products?page=1

I use BeautifulSoup to parse the html and find that I need to find all class_="product-results-item-link layout-row flex-gt-sm-33 flex-50"

I tried using containers = html_soup.find_all('a', class_="product-results-item-link layout-row flex-gt-sm-33 flex-50") but it can't be found

    from requests import get
    from bs4 import BeautifulSoup
    from time import sleep
    from random import randint
    import pandas as pd

    product_names = []
    status = []
    ori_prices = []
    sale_prices = []

    headers = {"Accept-Language": "en-US, en;q=0.5"}

    pages = [str(i) for i in range(1,2)]
    #pages = [str(i) for i in range(1,24)]

    for page in pages:

        response = get('https://shop.freedompop.com/products' + page, headers = headers)
        sleep(2)

        html_soup = BeautifulSoup(response.text, 'html.parser')

        containers = html_soup.find_all('a', class_="product-results-item-link layout-row flex-gt-sm-33 flex-50")

        print(containers)

I expect the output to be 18 but the actual output is []

Upvotes: 0

Views: 69

Answers (2)

Pankaj
Pankaj

Reputation: 939

Website accessing all the product entries dynamically through the api. So you can directly use the their API and get the data:

https://shop.freedompop.com/api/shop/store/555/item?page=1&pageSize=500&sort=RELEVANCE
import json
from requests import get
from bs4 import BeautifulSoup


response = get('https://shop.freedompop.com/api/shop/store/555/item?pageSize=410&sort=RELEVANCE')
html_soup = BeautifulSoup(response.text, 'html.parser')
parsed_response = json.loads(html_soup.text)


for index,value in enumerate(a.get('results')):
    print(index, value)

Upvotes: 1

chitown88
chitown88

Reputation: 28630

As stated by Pankaj (so accept his answer, as I'm just expanding upon his initial response), use the request url to get the data to you in a nice json format. You can also alter the params (ie change 'pageSize': '500' to get more products than just the 18 on the first page:

import requests


url = 'https://shop.freedompop.com/api/shop/store/555/item'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
params = {
'page': '1',
'pageSize': '18',
'sort': 'RELEVANCE'}

jsonData = requests.get(url, headers=headers, params=params).json()

for product in jsonData['results']:
    print (product['title'])

Output:

Netgear Unite Mobile Hotspot (GSM)
LG Tribute 2, 8GB Blue (CDMA)
Samsung Galaxy S5, 16GB Charcoal Black (CDMA)
Samsung Galaxy S5, 16GB Shimmery White (CDMA)
Samsung Galaxy S5, 16GB Shimmery White (CDMA)
Samsung Galaxy S4 Enhanced, 16GB Black Mist (CDMA)
Kyocera Hydro Vibe, 8GB Black (CDMA)
Samsung Galaxy S4, 16GB White Frost (CDMA)
Samsung Galaxy S4, 16GB White Frost (CDMA)
Motorola Moto E (2nd Generation), 8GB Black (CDMA)
Apple iPhone 5s, 16GB Gold (CDMA)
Samsung Galaxy S4, 16GB Black Mist (CDMA)
Franklin Wireless R850 4G LTE Mobile Hotspot (CDMA)
Apple iPhone 6, 16GB Space Gray (CDMA)
Samsung Galaxy S4 Enhanced, 16GB White Frost (CDMA)
Huawei Union, 8GB Black (CDMA)
Samsung Galaxy S5, 16GB Copper Gold (CDMA)
Samsung Galaxy S4 Enhanced, 16GB Black Mist (CDMA)

Changing params:

import requests


url = 'https://shop.freedompop.com/api/shop/store/555/item'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
params1 = {
'page': '1',
'pageSize': '18',
'sort': 'RELEVANCE'}

params2 = {
'page': '1',
'pageSize': '500',
'sort': 'RELEVANCE'}

jsonData = requests.get(url, headers=headers, params=params1).json()
print (len(jsonData['results']))

jsonData = requests.get(url, headers=headers, params=params2).json()
print (len(jsonData['results']))

Output:

18
405

Upvotes: 1

Related Questions