spak
spak

Reputation: 253

How to get the second <td> data when the first <td> contains your desired value

I am trying to write a for loop to retrieve some data and I am currently stuck. I need to get the second value when the first contains "Primary NAICS Code"

 <td class="col_left"><strong>Primary NAICS Code</strong></td>
 <td align="left">                                                        

  311811 : Retail Bakeries                                                    
                                                 </td>

My for loop which is obviously not working looks like this

for i, elem in enumerate(all_trs):
    inside_td = elem.find("td")
    if "NAICS" in inside_td.text:
    inside_td = elem.find("td")
    print(inside_td.text)

Really appreciate any help I could get. Thank you very much in advance.

Upvotes: 0

Views: 821

Answers (2)

David Zemens
David Zemens

Reputation: 53623

Untested, but instead of:

for i, elem in enumerate(all_trs):
    inside_td = elem.find("td")
    if "NAICS" in inside_td.text:
        inside_td = elem.find("td")
        print(inside_td.text)

Try this:

for i, elem in enumerate(all_trs):
    td_elems = elem.findAll('td')
    if 'NAICS' in td_elems[0].text:
        print(td_elems[1].text)

Explanation:

The findAll method returns a list of td elements so, just get a handle on this sequence, and you can of course index it :)

find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

Extracts a list of Tag objects that match the given criteria. You can specify the name of the Tag and any attributes you want the Tag to have.

The find method returns only the first td element, equivalent to basically: findAll('td')[0]

find(self, name=None, attrs={}, recursive=True, text=None, **kwargs)

Return only the first child of this Tag matching the given criteria.

Upvotes: 1

DYZ
DYZ

Reputation: 57033

It is the second next sibling of the <td> that contains the string of interest (the immediate next sibling is a line break):

import re
...
soup.body.findAll('td', text=re.compile('Primary NAICS Code'))[0]\
         .next_sibling.next_sibling

#<td align="left">                                                        
#
#  311811 : Retail Bakeries                                                    
#                                                 </td>

Upvotes: 0

Related Questions