Reputation: 2075
I'm using beautiful soup to print certain text from a website out into a console.
I'm trying to print the 'p' tag text which is 'AUG' & '27',but only if the 'a' tag contains a child 'img' tag, but for some reason nothing is printing out.
Here's the html code - (I've underlined the text I want in red')
<a data-qa="product-card-link" aria-label="Off-Line 'Black Menta' Release Date" class="card-link d-sm-b" href="/gb/launch/t/off-line-black-menta">
<div class="launch-time ta-sm-l d-sm-h d-md-b z10 mod-bg-grey pt6-sm pl6-sm">
<div class="launch-caption ta-sm-c">
<p class="mod-h2 ncss-brand u-uppercase fs19-sm fs28-md fs34-lg " data-qa="test-startDate">Aug</p>
<p class="mod-h1 ncss-brand test-day fs30-sm fs40-md" data-qa="test-day">28</p>
</div>
</div>
<img alt="Off-Line 'Black Menta' Release Date" class="image-component mod-image-component u-full-width" src="https://secure-images.nike.com/is/image/DotCom/CJ0693_002_A_PREM?$SNKRS_COVER_WD$&align=0,1" srcset="" style="opacity: 1; transition: opacity 1s ease 0s;">
</a>
Here's what I tried:
for a in soup.find_all('a', class_='card-link d-sm-b'):
if a.find('img'):
for p in a.find_all('p'):
print(p.text)
Upvotes: 0
Views: 131
Reputation: 195408
You can use CSS selector a.card-link:has(> img):has(p)
which will select all <a>
tags with class="card-link"
which contain direct child <img>
and <p>
tags (the <p>
tags can be any level deep):
from bs4 import BeautifulSoup
txt = '''
<a data-qa="product-card-link" aria-label="Off-Line 'Black Menta' Release Date" class="card-link d-sm-b" href="/gb/launch/t/off-line-black-menta">
<div class="launch-time ta-sm-l d-sm-h d-md-b z10 mod-bg-grey pt6-sm pl6-sm">
<div class="launch-caption ta-sm-c">
<p class="mod-h2 ncss-brand u-uppercase fs19-sm fs28-md fs34-lg " data-qa="test-startDate">Aug</p>
<p class="mod-h1 ncss-brand test-day fs30-sm fs40-md" data-qa="test-day">28</p>
</div>
</div>
<img alt="Off-Line 'Black Menta' Release Date" class="image-component mod-image-component u-full-width" src="https://secure-images.nike.com/is/image/DotCom/CJ0693_002_A_PREM?$SNKRS_COVER_WD$&align=0,1" srcset="" style="opacity: 1; transition: opacity 1s ease 0s;">
</a>'''
soup = BeautifulSoup(txt, 'html.parser')
for a in soup.select('a.card-link:has(> img):has(p)'):
all_p = [p.get_text(strip=True) for p in a.select('p')]
print(all_p)
Prints:
['Aug', '28']
EDIT: To get dates and names of products, you can use this script:
import requests
from bs4 import BeautifulSoup
url = 'https://www.nike.com/gb/launch?s=upcoming'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for d in soup.select('div.launch-caption'):
print(d.get_text(strip=True, separator=' '),
d.find_next('h3').get_text(strip=True),
d.find_next('h6').get_text(strip=True))
Prints:
Aug 27 Air Jordan 3 Denim
Aug 28 Off-Line Black Menta
Aug 28 Off-Line Vast Grey
Aug 29 Air Max 1 Evergreen Aura
Aug 29 Air Jordan 12 University Gold
Sep 1 ISPA Drifter Split Iron Grey
Sep 1 ISPA Drifter Split Spruce
Upvotes: 1