Parsing xpath with python

Question

I'm trying to parse a web page that contains this:

(it continues with more rows and ends with [/table]
tree = html.fromstring(page)
table = tree.xpath('//table/tr')
for item in table:
    for elem in item.xpath('*'):
        if 'colspan' in html.tostring(elem):
                print '*', elem.text
        elif elem.text is not None:
            print elem.text,
        else:
            print 
somewhat works. It does not get the text following the [br /] and it's far from elegant.  How do I get the missing text?  In addition, any suggestions for improving the code would be appreciated.


 February 20, 2015


 9:00 PM
 14°F


 Clear

  Precip:
  0 %

                                Wind:
                    from the WSW at 6 mph
 
 


 10:00 PM
 13°F


 Clear

  Precip:
  0 %

                                Wind:
                    from the WSW at 6 mph

alecxe · Accepted Answer

How about using .text_content()?

.text_content(): Returns the text content of the element, including the text content of its children, with no markup.

table = tree.xpath('//table/tr')
for item in table:
    print ' '.join(item.text_content().split())

join()+split() here help to replace multiple spaces with a single one.

It prints:

February 20, 2015
9:00 PM 14Â°F
Clear Precip: 0 % Wind: from the WSW at 6 mph
10:00 PM 13Â°F
Clear Precip: 0 % Wind: from the WSW at 6 mph

Since you want to merge time-line with a precip-line, you can iterate over tr tags but skipping those containing Precip in the text. For every time-line, get the following tr sibling to get the precip-line:

table = tree.xpath('//table/tr[not(contains(., "Precip"))]')
for item in table:
    text = ' '.join(item.text_content().split())
    if 'AM' in text or 'PM' in text:
        text += ' ' + ' '.join(item.xpath('following-sibling::tr')[0].text_content().split())

    print text

Prints:

February 20, 2015
9:00 PM 14Â°F Clear Precip: 0 % Wind: from the WSW at 6 mph
10:00 PM 13Â°F Clear Precip: 0 % Wind: from the WSW at 6 mph

Parsing xpath with python

Answers (1)

Related Questions