Iterate through all the rows in a table using python lxml xpath

Question

This is the source code of the html page I want to extract data from.

Webpage: http://gbgfotboll.se/information/?scr=table&ftid=51168 The table is at the bottom of the page

     
               
                        
                            
                                Kommande matcher
                            
                            
                                Tid
                                Match
                                Arena
                            
                        

                        

                    
                        
                            2014-09-26 19:30



                        
                        Guldhedens IK - IF Warta
                        Guldheden Södra 1 Konstgräs 
                    

                    
                        
                            2014-09-26 13:00



                        
                        Romelanda UF - IK Virgo
                        Romevi 1 Gräs 
                    

                    
                    
                        2014-09-27 13:00



                    
                    Kode IF - IK Kongahälla
                    Kode IP 1 Gräs 
                

                
                    
                        2014-09-27 14:00



                    
                    Floda BoIF - Partille IF FK 
                    Flodala IP 1

Right now i have this code that actually produces the result that i want..

import lxml.html
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
html = lxml.html.parse(url)
for i in range(12):
    xpath1 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[1]/span/span//text()" %(i+1)
    xpath2 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[2]/a/text()" %(i+1)
    time = html.xpath(xpath1)[1]
    date = html.xpath(xpath1)[0]
    teamName = html.xpath(xpath2)[0]
    if date == '2014-09-27':
        print time, teamName

Gives the result:

13:00 Romelanda UF - IK Virgo

13:00 Kode IF - IK Kongahälla

14:00 Floda BoIF - Partille IF FK

Now to the question. I don't want to use for loop with range because its not stable, the rows can change in that table and if it goes out of bounds it will crash. So my question is how can I iterate as I do here in a safe way. Meaning it will iterate through all the rows that are available in the table. No more no less. Also if you have any other suggestion making the code better/faster please go ahead.

Georges Martin · Accepted Answer

The following code will iterate whatever the number of rows. The rows_xpath will directly filter on the target date. The xpaths are also created once, outside the for loop, so it should be faster.

import lxml.html
from lxml.etree import XPath
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
date = '2014-09-27'

rows_xpath = XPath("//*[@id='content-primary']/table[3]/tbody/tr[td[1]/span/span//text()='%s']" % (date))
time_xpath = XPath("td[1]/span/span//text()[2]")
team_xpath = XPath("td[2]/a/text()")

html = lxml.html.parse(url)

for row in rows_xpath(html):
    time = time_xpath(row)[0].strip()
    team = team_xpath(row)[0]
    print time, team

Iterate through all the rows in a table using python lxml xpath

Answers (1)

Related Questions