mstr
mstr

Reputation: 23

Python: Content of <li> in lists with BeautifulSoup

I have the following data:

<li>
  <div>Content1</div>
</li>
<li>
  <div>Content2</div>
  <div>Content3</div>
  <div>Content4</div>
</li>
<li>
  <div>Content5</div>
  <div>Content6</div>
</li>

I want to put the content of each li-element in seperate list with BeautifulSoup. This should be the result:

List1 = ['Content1']
List2 = ['Content2', 'Content3', 'Content4']
List2 = ['Content5', 'Content6']

a line like div = [a.get_text(strip=True) for a in soup.select('li>div')] puts the whole content in one list. I struggle to create seperate lists for each li-element and fill it with the right content. Can someone help?

Upvotes: 2

Views: 52

Answers (2)

Adelin
Adelin

Reputation: 8219

You just need to create a new list for each li, like this:

divs = [[div.get_text(strip=True) for div in li.find_all("div")] for li in soup.select('li')]

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

You can use a nested list comprehension

Ex:

from bs4 import BeautifulSoup

html = """<ul>
<li>
  <div>Content1</div>
</li>
<li>
  <div>Content2</div>
  <div>Content3</div>
  <div>Content4</div>
</li>
<li>
  <div>Content5</div>
  <div>Content6</div>
</li>
</ul>"""

soup = BeautifulSoup(html, "html.parser")
print([[j.get_text(strip=True) for j in i.find_all("div")] for i in soup.find_all("li")])

Output:

[['Content1'], ['Content2', 'Content3', 'Content4'], ['Content5', 'Content6']]

Upvotes: 2

Related Questions