Reputation: 693
I've tried this a million different ways and can't figure out why Beautifulsoup is as unpredictable as all my exes.
I'm just trying to copy a table to a pandas dataframe. There's about 280 rows in the table.
Here's the url:
https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=
Here's part of my code that doesn't work:
with requests.Session() as s:
url = "https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc="
r = s.get(url, headers=req_headers)
#add contents of urls to soup variable from each url
soup = BeautifulSoup(r.content, 'lxml')
rows = soup.find_all("div", {"id": "diamonds_search_table"})
rows
Here's the are within the url where the table is:
What can I try next?
Upvotes: 2
Views: 66
Reputation: 195408
The data is loaded dynamically via JavaScript. You can use requests
module to simulate it.
For example:
import json
import requests
search_parameters = {
'shapes': "Round",
'cuts': "Fair,Good,Very Good,Ideal,Super Ideal",
'colors': "J,I,H,G,F,E,D",
'clarities': "SI2,SI1,VS2,VS1,VVS2,VVS1,IF,FL",
'polishes': "Good,Very Good,Excellent",
'symmetries': "Good,Very Good,Excellent",
'fluorescences': "Very Strong,Strong,Medium,Faint,None",
'min_carat': "0.25",
'max_carat': "11.58",
'min_table': "50.00",
'max_table': "86.00",
'min_depth': "46.20",
'max_depth': "629.00",
'min_price': "420",
'max_price': "1258930",
'stock_number': "",
'row': "0",
'page': "1",
'requestedDataSize': "200",
'order_by': "price",
'order_method': "asc",
'currency': "$",
'has_v360_video': "",
'dedicated': "",
'sid': "",
'min_ratio': "1.00",
'max_ratio': "2.75",
'shipping_day': "",
'MIN_PRICE': "420",
'MAX_PRICE': "1258930",
'MIN_CARAT': "0.25",
'MAX_CARAT': "11.58",
'MIN_TABLE': "45",
'MAX_TABLE': "86",
'MIN_DEPTH': "46.2",
'MAX_DEPTH': "629"
}
data = requests.get('https://www.brilliantearth.com/loose-diamonds/list/', params=search_parameters).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for d in data['diamonds']:
print('{:<30} {:<15} {}'.format(d['title'], d['cut'], d['price']))
Prints:
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Very Good 420
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Good 430
0.30 Carat Round Diamond Ideal 430
0.30 Carat Round Diamond Very Good 430
0.25 Carat Round Diamond Super Ideal 430
0.30 Carat Round Diamond Very Good 430
0.32 Carat Round Diamond Ideal 430
... and so on.
Upvotes: 1
Reputation: 1560
You can use selenium
for parsing html
.You can try:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.brilliantearth.com/design-your-own-engagement-ring/?sid=3755106&dc=')
html = driver.page_source
soup = BeautifulSoup(html)
rows = soup.find_all("div", {"id": "diamonds_search_table"})
print(rows)
You will get all rows like below:
[<div class="search-table" id="diamonds_search_table" style="position: relative; height: 34000px;">
<div class="inner item" data-have="true" data-position="0" style="position: absolute; width: 100%; height: 34px;top:0px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9361809/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9361809" onclick="dtl.stop_jump();" scope="col" width="7%"><div class="checkbox checkbox-ty4"><label><input class="hidden"/><span class="sr-only">checkbox</span><i class="icons-checkbox"></i></label></div></td><td scope="col" width="9%">Round</td><td scope="col" width="9%">0.30</td><td scope="col" width="8%">H</td><td scope="col" width="8%">SI2</td><td scope="col" width="12%">Very Good</td><td scope="col" width="8%">GIA</td><td scope="col" width="12%">Botswana Sort</td><td class="width_ratio_hide" scope="col" width="8%">1</td><td scope="col" width="10%">$420</td><td scope="col" width="7%"><span class="view">View</span></td></tr></tbody></table></div><div class="inner item" data-have="true" data-position="34" style="position: absolute; width: 100%; height: 34px;top:34px;"><a class="td-n2" href="/rings/cyorings/view_diamond/9391074/?sid=3755106&first=diamond&show_diamond_tab=true"></a><table border="0" cellpadding="0" cellspacing="0" class="table-striped table-hover search-result-table" width="100%"><tbody><tr class="search-item"><td data-id="9391074"
and so on...........]
Upvotes: 1