Alexander Chrome
Alexander Chrome

Reputation: 63

BeautifulSoup: Extract and group label/value text

I have extracted html text using beautifulsoup. The text is organized in multiple groups in the website, with information for each group.

There are multiple label/value pairs for each group. I want to extract the label/value pairs for each group in order to match that information to each group. Ex.

Group1:

<div class"table-cell table-cell--data">
"other code"
  <li class="data-list__item">        `
      <span class="data-list__label">Title:</span>
      <span class"data-list__value">Cows</span>
      </li>
   <li class="data-list__item">        `
      <span class="data-list__label">Color:</span>
      <span class"data-list__value">Blue</span>
      </li>

There are multiple lists inside of each group with multiple values.

I want a final output to be: Group1_labels = [Title,Color] and Group1_value = [Cows,Blue].

I have written the code below to create a list of where list[0] will return the text only for the first list above for group1.

soup = bs4.BeautifulSoup(site.text,'html.parser')
groups= soup.select('.table-cell--data')
for group in range(0,len(group)):
for li in group[lot].find_all('li',class_ = "data-list__item"):
    list.append(li)

Upvotes: 1

Views: 1046

Answers (1)

KunduK
KunduK

Reputation: 33384

Try following css selector.

soup=bs4.BeautifulSoup(site.text,"html.parser")

for group in soup.select('.table-cell--data'):
    labels=[label.text for label in group.select('.data-list__label')]
    values = [value.text for value in group.select('.data-list__value')]

    print(labels)
    print(values)

Based on your example.

html='''<div class="table-cell table-cell--data">
"other code"
  <li class="data-list__item">        `
      <span class="data-list__label">Title:</span>
      <span class="data-list__value">Cows</span>
      </li>
   <li class="data-list__item">        `
      <span class="data-list__label">Color:</span>
      <span class="data-list__value">Blue</span>
      </li>
      </div>'''

soup=bs4.BeautifulSoup(html,"html.parser")

for group in soup.select('.table-cell--data'):
    labels=[label.text for label in group.select('.data-list__label')]
    values = [value.text for value in group.select('.data-list__value')]

    print(labels)
    print(values)

Output:

['Title:', 'Color:']
['Cows', 'Blue']

Upvotes: 1

Related Questions