james joyce
james joyce

Reputation: 493

Not able to find a link in a product page

I am trying to make a list of the links that are inside a product page.

I have multiple links through which I want to get the links of the product page.

I am just posting the code for a single link.

r = requests.get("https://funskoolindia.com/products.php?search=9723100")
soup = BeautifulSoup(r.content)
for a_tag in soup.find_all('a', class_='product-bg-panel', href=True):
    print('href: ', a_tag['href'])

This is what it should print: https://funskoolindia.com/product_inner_page.php?product_id=1113

Upvotes: 1

Views: 78

Answers (3)

chiko360
chiko360

Reputation: 43

try this : print('href: ', a_tag.get("href")) and add features="lxml" to the BeautifulSoup constructor

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195603

The data are loaded dynamically through Javascript from different URL. One solution is using selenium - that executes Javascript and load links that way.

Other solution is using re module and parse the data url manually:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://funskoolindia.com/products.php?search=9723100'
data_url = 'https://funskoolindia.com/admin/load_data.php'

data = {'page':'1',
    'sort_val':'new',
    'product_view_val':'grid',
    'show_list':'12',
    'brand_id':'',
    'checkboxKey': re.findall(r'var checkboxKey = "(.*?)";', requests.get(url).text)[0]}

soup = BeautifulSoup(requests.post(data_url, data=data).text, 'lxml')

for a in soup.select('#list-view .product-bg-panel > a[href]'):
    print('https://funskoolindia.com/' + a['href'])

Prints:

https://funskoolindia.com/product_inner_page.php?product_id=1113

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71471

The site is dynamic, thus, you can use selenium

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://funskoolindia.com/products.php?search=9723100')
results = [*{i.a['href'] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'product-media light-bg'})}]

Output:

['product_inner_page.php?product_id=1113']

Upvotes: 2

Related Questions