How to extract specific dl, dt list elements using BeautifulSoup

Question

I'm trying to extract the date, link, and title for news releases from this website (in Japanese):

https://www.rinnai.co.jp/releases/index.html

Here is the code that I've tried so far:

import requests
from bs4 import BeautifulSoup

r=requests.get("https://www.rinnai.co.jp/releases/index.html")
c=r.content
soup=BeautifulSoup(c,"html.parser")

all=soup.find_all("dl",)

My expected results are:

2019年01月09日
/releases/2019/0109/index_2.html
「深型スライドオープンタイプ」食器洗い乾燥機2019年3月1日発売 食器も調理器具もまとめて入る大容量

2019年01月09日
/releases/2019/0109/index_1.html
シンプルキッチンに似合う洗練されたドロップインコンロ 2月1日新発売 耐久性に優れたステンレストッププレート仕様のグリルレスコンロ

And my actual results are:

[
2019年01月09日





「深型スライドオープンタイプ」食器洗い乾燥機2019年3月1日発売 食器も調理器具もまとめて入る大容量




, 
2019年01月09日





シンプルキッチンに似合う洗練されたドロップインコンロ 2月1日新発売 耐久性に優れたステンレストッププレート仕様のグリルレスコンロ




,

Bitto · Accepted Answer

There is no need to complicate this and you are half way there already. You can just iterate through all and get the data you want from each dl. You can then choose to print or save this to a list.

import requests
from bs4 import BeautifulSoup
r=requests.get("https://www.rinnai.co.jp/releases/index.html")
c=r.content
soup=BeautifulSoup(c,"html.parser")
all=soup.find('div',id='index_news').find_all("dl")
#uncomment below line if saving to a list
#all_data=[]
for dl in all:
    date=dl.find('dt').text.strip()
    link=dl.find('a')['href'].strip()
    title=dl.find('a').text.strip()
    print(f'{date}
{link}
{title}
')
    #instead of printing you can save it to a list if you want
    #uncomment below line if saving to a list
    #all_data.append([date,link,title])

Output:

2019年01月09日
/releases/2019/0109/index_2.html
「深型スライドオープンタイプ」食器洗い乾燥機2019年3月1日発売 食器も調理器具もまとめて入る大容量

2019年01月09日
/releases/2019/0109/index_1.html
シンプルキッチンに似合う洗練されたドロップインコンロ 2月1日新発売 耐久性に優れたステンレストッププレート仕様のグリルレスコンロ

...

How to extract specific dl, dt list elements using BeautifulSoup

Answers (2)

Related Questions