Reputation: 35
I want to extract HTML paragraph from the HTML source. But it's getting data of color and id along with it.
import requests
from bs4 import BeautifulSoup
url = "https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
description = soup.find(
'div', {'class': 'description-preview body-2 css-1pbvugb'}).text
print(description)
Upvotes: 0
Views: 237
Reputation: 11525
if that's your only target from the link, so you don't need to use a real parser in that case, since that's will loads all the content within cache
memory.
You can compare the operation time using regex
or bs4
parser.
below is a quick catch:
import re
import requests
r = requests.get(
'https://www.nike.com/gb/t/air-max-viva-shoe-ZQTSV8/DB5268-003')
match = re.search(r'descriptionPreview\":\"(.+?)\"', r.text).group(1)
print(match)
Output:
Designed with every woman in mind, the mixed material upper of the Nike Air Max Viva
features a plush collar, detailed patterning and intricate stitching. The new lacing
system uses 2 separate laces constructed from heavy-duty tech chord, letting you find the perfect fit. Mixing comfort with style, it combines Nike Air with a lifted foam
heel for and unbelievable ride that looks as good as it feels.
In case if you would like to use bs4
:
Here's a short usage:
soup = BeautifulSoup(r.text, 'lxml')
print(soup.select_one('.description-preview').p.string)
Note: i used
lxml
parser as it's the quickest parser according to bs4-documentation
Upvotes: 1
Reputation:
It seems you want the text of the next <p>
:
description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find_next('p').text
Upvotes: 1
Reputation: 9969
Just use .find p with after it.
description = soup.find('div', {'class':'description-preview body-2 css-1pbvugb'}).find("p").text
Upvotes: 1