Alex TheWebGroup
Alex TheWebGroup

Reputation: 175

Scraping table with BeautifulSoup4

I am trying to scrape some particulars rows inside a table but I don't know how to access the information properly. Here is the html:

<tr class="even">
  <td style="background: #F5645C; color: #F5645C;">1&#160;</td>
  <td>Michael</td>
  <td class="right">57</td>
  <td class="right">0</td>
  <td class="right">5</td>
</tr>
<tr class="odd">
  <td style="background: #8FB9B0; color: #8FB9B0;">1&#160;</td>
  <td>Clara</td>
  <td class="right">48</td>
  <td class="right">0</td>
  <td class="right">5</td>
</tr>
<tr class="even">
  <td style="background: #F5645C; color: #F5645C;">1&#160;</td>
  <td>Lisa</td>
  <td class="right">44</td>
  <td class="right">2</td>
  <td class="right">5</td>
</tr>
<tr class="odd">
  <td style="background: #8FB9B0; color: #8FB9B0;">0&#160;</td>
  <td>Joe</td>
  <td class="right">43</td>
  <td class="right">0</td>
  <td class="right">13</td>
</tr>
<tr class="even">
  <td style="background: #F5645C; color: #F5645C;">1&#160;</td>
  <td>John</td>
  <td class="right">38</td>
  <td class="right">3</td>
  <td class="right">4</td>
</tr>
<tr class="odd">
  <td style="background: #F5645C; color: #F5645C;">1&#160;</td>
  <td>Francesca</td>
  <td class="right">35</td>
  <td class="right">2</td>
  <td class="right">5</td>
</tr>
<tr class="even">
  <td style="background: #8FB9B0; color: #8FB9B0;">0&#160;</td>
  <td>Carlos</td>
  <td class="right">27</td>
  <td class="right">1</td>
  <td class="right">2</td>
</tr>

What I try to obtain, is the text on the next td that comes after every td with the style of color F5645C, but unfortunately I am running into problems. This is what I want the script to return: Michael Lisa John Francesca

Here is the code I currently have:

table = soup.find('table')
table_rows = table.find_all('tr')

for tr in table_rows:
    td = tr.find('td', style='background: #F5645C; color: #F5645C;').find_next_sibling('td').get_text()
    print(td)

On running the script: AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

Upvotes: 0

Views: 103

Answers (4)

Andrej Kesely
Andrej Kesely

Reputation: 195438

You can use CSS selector to select all <td> tags that contain attribute style with string color: #F5645C and then apply method find_next():

for td in soup.select('td[style*="color: #F5645C"]'):
  print(td.find_next('td').text)

This prints:

Michael
Lisa
John
Francesca

Upvotes: 1

Ajay Bisht
Ajay Bisht

Reputation: 585

Use can use find_all and a filter for the style atribute:

 bs = BeautifulSoup(htmlcontent)
 bs.find_all('td', attrs={'style':'background-color: #F5645C, color: #F5645C'})

Upvotes: 0

Rakesh
Rakesh

Reputation: 82765

Use .findNext("td").text

Ex:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for tr in soup.find_all("tr"):
    print(tr.td.findNext("td").text)

Output:

Michael
Clara
Lisa
Joe
John
Francesca
Carlos

Upvotes: 1

iamklaus
iamklaus

Reputation: 3770

data = BeautifulSoup(html)
for tr in data.find_all('tr'):
    td = tr.find_all('td')
    print(td[1].text)

Now you can take it further i think..

Upvotes: 1

Related Questions