Reputation: 115
I am using python for webscraping (new to this) and am trying to grab the brand name from a website. It is not visible on the website but I have found the element for it:
<span itemprop="Brand" style="display:none;">Revlon</span>
I want to extract the "Revlon" text in the HTML. I am currently using html requests and have tried grabbing the selector (CSS) and text:
brandname = r.html.find('body > div:nth-child(96) > span:nth-child(2)', first=True).text.strip()
but this returns None
and an error. I am not sure how to extract this specifically. Any help would be appreciated.
Upvotes: 0
Views: 4679
Reputation: 36
try this method .find("span", itemprop="Brand") I think it's work
from bs4 import BeautifulSoup
import requests
urlpage = 'https://www.boots.com/revlon-colorstay-makeup-for-normal-dry-skin-10212694'
page = requests.get(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.find("span", itemprop="Brand").text)
Upvotes: 1
Reputation: 663
Here is a working solution with Selenium:
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
website = 'https://www.boots.com/revlon-colorstay-makeup-for-normal-dry-skin-10212694'
driver.get(website)
brand_name = driver.find_element_by_xpath('//*[@id="estore_product_title"]/h1')
print('brand name: '+brand_name.text.split(' ')[0])
You can also use beautifulsoup for that:
from bs4 import BeautifulSoup
import requests
urlpage = 'https://www.boots.com/revlon-colorstay-makeup-for-normal-dry-skin-10212694'
# query the website and return the html to the variable 'page'
page = requests.get(urlpage)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page.content, 'html.parser')
name = soup.find(id='estore_product_title')
print(name.text.split(' ')[0])
Upvotes: 3