Reputation: 73
Target website: Coins Britská Guyana
The HTML File
<div class="i_d">
<dl>
<dt class="">Série:</dt>
<dd><a href="/cs/coins/list/country/349-Britsk%C3%A1_Guyana/series/101385-Britsk%C3%A1_Guyana_-_Standardn%C3%AD_ra%C5%BEba">Britská Guyana - Standardní ražba</a></dd>
<dt>Katalogové číslo:</dt>
<dd><strong>WCC:</strong>km22</dd>
<dt>Témata:</dt><dd><a href="/cs/coins/list/country/349-Britsk%C3%A1_Guyana/theme/641-Kr%C3%A1lov%C3%A9">Králové</a> | <a href="/cs/coins/list/country/349-Britsk%C3%A1_Guyana/theme/3134-V%C4%9Bnce">Věnce</a></dd>
...
</dl>
I try to get this output:
Série: Britská Guyana - Standardní ražba
Katalogové číslo: WCC:km22
Témata: Králové|Věnce
...Next Coin value
I tried this code:
vysledek = soup.find_all('div', attrs={'class':'pl-it'})
for hledani_dat in vysledek:
nazev_mince = hledani_dat.find('h2', attrs={'class':'item_header'})
nazev_mince_final = nazev_mince.text.strip()
dd = hledani_dat.find('div', attrs={'class':'i_d'})
dd_final = dd.text.strip()
print(nazev_mince_final, dd_final)
I got all the values of all coins in <div class=i_d></div>
(Data from all dt dl elements)
But how to get only selective values of dt
dl
and not all?
EXPECTED OUTPUT:
Témata: Králové|Věnce
Upvotes: 0
Views: 65
Reputation: 84465
You can use :contains to target the appropriate dt and then move with an adjacent sibling combinator to the dd. Add some handling for where target e.g. Témata:
is not present
import requests
from bs4 import BeautifulSoup as bs
import re
r = requests.get('https://colnect.com/cs/coins/list/country/349-Britsk%C3%A1_Guyana', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
results = []
for coin in soup.select('.pl-it'):
print('coin:' , coin.select_one('.item_header a').text)
print('-' * 20)
target = coin.select_one('dt:contains("Témata:") + dd')
if target is None:
print('Not present')
else:
print(target.get_text())
print()
Upvotes: 1