pygeek
pygeek

Reputation: 11

Parsing using lxml and requests with python

Recently I was trying to parse html table from web page using lxml and requests.

The python code runs like this:

>>> from lxml to html
>>> import requests
>>> page = requests.get('http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-evening-star-candlestick-pattern')
>>> tree = html.fromstring(page.text)'

Then I would like to parse the following repetitive data block using lxml.xpath() function to get lists:

<TR>
    <TD style="font-size: 11px;"><!-- <a href="/company-technical-details/<%=sr.getExchange()%>/<%=sr.getSymbol()%>/<%=sr.getName()%>" ><%= sr.getSymbol() %></a>  -->
                    AMTEKINDIA           </TD>
    <TD style="font-size: 11px; max-width: 135px;">AMTEK INDIA LIMITED</TD>
    <TD>                nse         </TD>
    <TD style="min-width: 60px; max-width: 60px;">02-01-2015</TD>
    <TD>78</TD>
    <TD>78.3</TD>
    <TD>72.25</TD>
    <TD>73.9</TD>

But unable to do so getting an error, e.g:

>>> symbol=tree.xpath('//TD[@style="font-size: 11px;"][@!-- [@a href="/company-t
echnical-details/[@%=sr.getExchange()%]/[@%=sr.getSymbol()%]/[@%=sr.getName()%]"
 ][@%= sr.getSymbol() %][@/a]  --]/text()')

giving Xpath evaluation error and

>>> prices=tree.xpath('//TD/text()')

returning list with no values.

Upvotes: 1

Views: 4895

Answers (1)

Tomalak
Tomalak

Reputation: 338248

The rows you are interested in are inside the <table> with the ID sortable.

from lxml import html

url = 'http://www.bigpaisa.com/candlestick-stock-screener-result/nse/bearish-%20evening-star-candlestick-pattern'
doc = html.parse(url)

# you can use XPath to select elements...
rows = doc.xpath("//table[@id = 'sortable']/tbody/tr")

# or, if you prefer, use CSS selectors instead...
rows = doc.cssselect("table#sortable tbody tr")

for tr in rows:
    # do something with each tr, for example
    tds = tr.cssselect("td")
    print tds[4].text

Note that you don't need the requests module at all.

Upvotes: 2

Related Questions