Reputation: 63
I have extracted html
text using beautifulsoup
. The text is organized in multiple groups in the website, with information for each group.
There are multiple label/value pairs for each group. I want to extract the label/value pairs for each group in order to match that information to each group. Ex.
Group1:
<div class"table-cell table-cell--data">
"other code"
<li class="data-list__item"> `
<span class="data-list__label">Title:</span>
<span class"data-list__value">Cows</span>
</li>
<li class="data-list__item"> `
<span class="data-list__label">Color:</span>
<span class"data-list__value">Blue</span>
</li>
There are multiple lists inside of each group with multiple values.
I want a final output to be: Group1_labels = [Title,Color] and Group1_value = [Cows,Blue]
.
I have written the code below to create a list of where list[0] will return the text only for the first list above for group1.
soup = bs4.BeautifulSoup(site.text,'html.parser')
groups= soup.select('.table-cell--data')
for group in range(0,len(group)):
for li in group[lot].find_all('li',class_ = "data-list__item"):
list.append(li)
Upvotes: 1
Views: 1046
Reputation: 33384
Try following css selector.
soup=bs4.BeautifulSoup(site.text,"html.parser")
for group in soup.select('.table-cell--data'):
labels=[label.text for label in group.select('.data-list__label')]
values = [value.text for value in group.select('.data-list__value')]
print(labels)
print(values)
Based on your example.
html='''<div class="table-cell table-cell--data">
"other code"
<li class="data-list__item"> `
<span class="data-list__label">Title:</span>
<span class="data-list__value">Cows</span>
</li>
<li class="data-list__item"> `
<span class="data-list__label">Color:</span>
<span class="data-list__value">Blue</span>
</li>
</div>'''
soup=bs4.BeautifulSoup(html,"html.parser")
for group in soup.select('.table-cell--data'):
labels=[label.text for label in group.select('.data-list__label')]
values = [value.text for value in group.select('.data-list__value')]
print(labels)
print(values)
Output:
['Title:', 'Color:']
['Cows', 'Blue']
Upvotes: 1