Reputation: 35
I have the code from an answer to another question of mine.
It can pull data in each page. So, my next problem is how to drag data in each dress like model's name, model's size ,and features.
More than that there are more than one model in each dress (for example BOHO BIRD Amore Wrap Dress have a 3 model who wearing size 10, 14, and 16 for example.
import json
import requests
from bs4 import BeautifulSoup
cookies = {
"ServerID": "1033",
"__zlcmid": "10tjXhWpDJVkUQL",
}
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
}
def extract_info(bs: BeautifulSoup, tag: str, attr_value: str) -> list:
return [i.text.strip() for i in bs.find_all(tag, {"itemprop": attr_value})]
def extract_info(bs: BeautifulSoup, tag: str, attr_value: str) -> list:
return [i.text.strip() for i in bs.find_all(tag, {"itemprop": attr_value})]
all_pages = []
for page in range(1, 29):
print(f"{all_pages}\nFound: {len(all_pages)} dresses.")
current_page = f"https://www.birdsnest.com.au/womens/dresses?page={page}"
source = requests.get(current_page, headers=headers, cookies=cookies)
soup = BeautifulSoup(source.content, 'html.parser')
brand = extract_info(soup, tag="strong", attr_value="brand")
name = extract_info(soup, tag="h2", attr_value="name")
price = extract_info(soup, tag="span", attr_value="price")
all_pages.extend(
[
{
"brand": b,
"name": n,
"price": p,
} for b, n, p in zip(brand, name, price)
]
)
with open("all_the_dresses2.json", "w") as jf:
json.dump(all_pages, jf, indent=4)
Upvotes: 0
Views: 445
Reputation: 2237
The information that you want is being generated dynamically. So, you won't get it with requests
. I suggest you use selenium for that.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
link = 'https://www.birdsnest.com.au/brands/boho-bird/73067-amore-wrap-dress'
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome('C:/Users/../Downloads/../chromedriver.exe', options=options)
driver.get(link)
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
page_new = soup.find('div', class_='model-info clearfix')
results = page_new.find_all('p')
for result in results:
print(result.text)
Output
Marnee usually wears a size 8.
She is wearing a size 10 in this style.
Her height is 178 cm.
Show Marnee’s body measurements
Marnee’s body measurements are:
Bust 81 cm
Waist 64 cm
Hips 89 cm
<div class="model-info-header">
<p>
<strong><span class="model-info__name">Marnee</span></strong> usually wears a size <strong><span class="model-info__standard-size">8</span></strong>.
She is wearing a size <strong><span class="model-info__wears-size">10</span></strong> in this style.
</p>
<p class="model-info-header__height">Her height is <strong><span class="model-info__height">178 cm</span></strong>.</p>
<p>
<span class="js-model-info-more model-info__link model-info-header__more">Show <span class="model-info__name">Marnee</span>’s body measurements</span>
</p>
</div>
With requests
you will miss all the data in BOLD which is what you want.
Upvotes: 1