Reputation: 69
I'm new to parsing.. I have a simple html without class aributes list like:
<h2><a href="..">Title 1</a></h2>
<ol>
<li>Line 1..</li>
<li>Line 2...</li>
...
</ol>
<h2><a href="..">Title 2</a></h2>
<ol>
<li>Line 2-1..</li>
<li>Line 2-2...</li>
...
</ol>
...
and so on..
I run this code:
import requests
from bs4 import BeautifulSoup as BS
r = requests.get('http://...')
html = BS(r.content, 'html.parser')
H2 = html.find_all('h2')
for h2 in H2:
title = h2.text
print(title)
to get titles.. but how I can get <ol>
list assigned to this title in same loop?
Upvotes: 1
Views: 207
Reputation: 195408
Another solution: You can use .find_previous
:
from bs4 import BeautifulSoup
txt = '''
<h2><a href="..">Title 1</a></h2>
<ol>
<li>Line 1</li>
<li>Line 2</li>
...
</ol>
<h2><a href="..">Title 2</a></h2>
<ol>
<li>Line 2-1</li>
<li>Line 2-2</li>
...
</ol>
'''
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for li in soup.select('ol li'):
out.setdefault(li.find_previous('h2').text, []).append(li.text)
print(out)
Prints:
{'Title 1': ['Line 1', 'Line 2'],
'Title 2': ['Line 2-1', 'Line 2-2']}
Upvotes: 1
Reputation: 12672
An easy way is to use zip
.Try:
import requests
from bs4 import BeautifulSoup as BS
source = '''
<h2><a href="..">Title 1</a></h2>
<ol>
<li>Line 1..</li>
<li>Line 2...</li>
</ol>
<h2><a href="..">Title 2</a></h2>
<ol>
<li>Line 2-1..</li>
<li>Line 2-2...</li>
</ol>
'''
html = BS(source, 'html.parser')
for title, element in zip(html.find_all('h2'), html.find_all('ol')):
print(title.text, element.text)
Result:
Title 1
Line 1..
Line 2...
Title 2
Line 2-1..
Line 2-2...
Attention: if the amount of them are different, you could use itertools.zip_longest
instead of zip
.
Upvotes: 1