max
max

Reputation: 693

Beautifulsoup not returning child elements

I've tried this a million different ways and can't figure out why Beautifulsoup is as unpredictable as all my exes.

I'm just trying to copy a table to a pandas dataframe. There's about 280 rows in the table.

Here's the url:

https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=

Here's part of my code that doesn't work:

with requests.Session() as s:
    url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
    r = s.get(url, headers=req_headers)

#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows

Here's the are within the url where the table is:

enter image description here

What can I try next?

Upvotes: 2

Views: 66

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195408

The data is loaded dynamically via JavaScript. You can use requests module to simulate it.

For example:

import json
import requests


search_parameters = {
'shapes':  "Round",
'cuts':    "Fair,Good,Very Good,Ideal,Super Ideal",
'colors':  "J,I,H,G,F,E,D",
'clarities':   "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes':    "Good,Very Good,Excellent",
'symmetries':  "Good,Very Good,Excellent",
'fluorescences':   "Very Strong,Strong,Medium,Faint,None",
'min_carat':   "0.25",
'max_carat':  "11.58",
'min_table':   "50.00",
'max_table':   "86.00",
'min_depth':   "46.20",
'max_depth':   "629.00",
'min_price':   "420",
'max_price':   "1258930",
'stock_number':    "",
'row': "0",
'page':    "1",
'requestedDataSize':   "200",
'order_by':    "price",
'order_method':    "asc",
'currency':    "$",
'has_v360_video':  "",
'dedicated':   "",
'sid': "",
'min_ratio':   "1.00",
'max_ratio':   "2.75",
'shipping_day':    "",
'MIN_PRICE':   "420",
'MAX_PRICE':   "1258930",
'MIN_CARAT':   "0.25",
'MAX_CARAT':  "11.58",
'MIN_TABLE':   "45",
'MAX_TABLE':   "86",
'MIN_DEPTH':   "46.2",
'MAX_DEPTH':   "629"
}

data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for d in data['diamonds']:
    print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))

Prints:

0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Very Good       420
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Good            430
0.30 Carat Round Diamond       Ideal           430
0.30 Carat Round Diamond       Very Good       430
0.25 Carat Round Diamond       Super Ideal     430
0.30 Carat Round Diamond       Very Good       430
0.32 Carat Round Diamond       Ideal           430

... and so on.

Upvotes: 1

Humayun Ahmad Rajib
Humayun Ahmad Rajib

Reputation: 1560

You can use selenium for parsing html.You can try:

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')

html = driver.page_source
soup = BeautifulSoup(html)


rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)

You will get all rows like below:

[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">$420</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&amp;first=diamond&amp;show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"


and so on...........]

Upvotes: 1

Related Questions