Madhu Kiran Arja
Madhu Kiran Arja

Reputation: 11

How to extract 2nd column in html table in python?

<table style="width:300px" border="1">
<tr>
  <td>John</td>
  <td>Doe</td>      
  <td>80</td>
</tr>
<tr>
  <td>ABC</td>
  <td>abcd</td>     
  <td>80</td>
</tr>
<tr>
  <td>EFC</td>
  <td>efc</td>      
  <td>80</td>
</tr>
</table>

I need to grab all the td's in column 2 in python.I am new to python.

import urllib2
from bs4 import BeautifulSoup

url = "http://ccdsiu.byethost33.com/magento/adamo-13.html"
text = urllib2.urlopen(url).read()
soup = BeautifulSoup(text)
data = soup.findAll('div',attrs={'class':'madhu'})
for div in data:
    trdata = div.findAll('tr')
    tddata = div.findAll('td')
    for trr in trdata:
        print trr

I am trying to get data from above code .It is printing all the td elements in table.I am trying to achieve this by Xpath

Upvotes: 0

Views: 1986

Answers (3)

heinst
heinst

Reputation: 8786

It is not clear really what you want since your example of html is not relevant and the description of just second column tds isnt really helpful. Anyway I modified Elmos answer to give you the Importance title and then the actual importance level of each thing.

for div in data:
    trdata = div.findAll('tr')
    tddata = div.findAll('td')
    count = 0
    for i in range(0, len(tddata)):
        if count % 6 == 0:
            print tddata[count + 1]
        count += 1

Upvotes: 0

FatalError
FatalError

Reputation: 54551

I don't think you can use xpath like you mentioned with BeautifulSoup. However, the lxml module, which comes with python, can do it.

from lxml import etree

table = '''
<table style="width:300px" border="1">
<tr>
  <td>John</td>
  <td>Doe</td>      
  <td>80</td>
</tr>
<tr>
  <td>ABC</td>
  <td>abcd</td>     
  <td>80</td>
</tr>
<tr>
  <td>EFC</td>
  <td>efc</td>      
  <td>80</td>
</tr>
</table>
'''

parser = etree.HTMLParser()
tree = etree.fromstring(table, parser)
results = tree.xpath('//tr/td[position()=2]')

print 'Column 2\n========'
for r in results:
    print r.text

Which when run prints

Column 2
========
Doe
abcd
efc

Upvotes: 1

ElmoVanKielmo
ElmoVanKielmo

Reputation: 11290

You don't have to iterate over your td elements. Use this:

for div in data:
    trdata = div.findAll('tr')
    tddata = div.findAll('td')
    if len(tddata) >= 2:
        print tddata[1]

Lists are indexed starting from 0. I check the length of the list to make sure that second td exists.

Upvotes: 0

Related Questions