Reputation:
I'm trying to extract Some_Product_Title
from this block of HTML code
<div id="titleSection" class="a-section a-spacing-none">
<h1 id="title" class="a-size-large a-spacing-none">
<span id="productTitle" class="a-size-large">
Some_Product_Title
</span>
The lines below are working fine
page = requests.get(URL, headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
But the code below is not
title = soup.find_all(id="productTitle")
Since when I try print(title)
I get None
as the console output
Does anyone know how to fix this?
Upvotes: 1
Views: 1688
Reputation: 335
You're probably having trouble with .find()
because the site from which you are creating the soup is, in all likelihood, generating its html code via javascript.
If this is the case, to find an element by id
, you should implement the following:
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")
title = soup2.find(id = "productTitle")
Upvotes: 3
Reputation: 633
import requests
from bs4 import BeautifulSoup
URL = 'https://your-own.address/some-thing'
page = requests.get(URL, headers = headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.findAll('',{"id":"productTitle"})
print(*title)
Upvotes: 1
Reputation: 9
BS4 has CSS selectors built in so you can use:
soup.select('#productTitle')
This would also work:
title = soup.find_all("span", { "id" : "productTitle" })
Upvotes: 1