Parsing results from html table

Question

im trying to match some data from a html output but im not sure what i could do to perform it right. So, im using the following block of code to extract the content of access and groups information:

import requests
import lxml.etree as LE
import lxml.html as LH

url = "http://theurl"
r = requests.get(url,auth=('user', 'pass'))
html = r.text

root = LH.fromstring(html)
LE.strip_tags(root, 'b')
data_list = root.xpath("""//td[text()='grouplist']
                             /following-sibling::*""")[0]

accessList= data_list.xpath("""//td[text()='access']
                                 /following-sibling::*/text()""")

groups = data_list.xpath("""//td[text()='groups']
                                 /following-sibling::*/text()""")

if i print the accessList, i have the data that i want:

print accessList
['Administrators', 'group_a', 'group_b', 'group_c']

but when i print the groups, the returning result would be:

print groups:
['
','
','
']

Having that information, what could be done in order to get:

print groups
['group_a', 'group_b', 'group_c']

Here, you can see the returning html result



   grouplist
   
      

   
      EDIT : Html code can be tested here:  html tester 
Thanks in advance.


   access
   Administrators


   inUse
   true


   groups
   
      
         
            group_a
         
         
            group_b
         
         
            group_c
         
      
   


   deny

unutbu · Accepted Answer

groups = data_list.xpath("""//td[text()='groups']
                                 /following-sibling::td/table/tr/td/text()""")

or, a little less specifically,

groups = data_list.xpath("""//td[text()='groups']
                                 /following-sibling::*//td/text()""")

works. If that too specific for your purpose, you could instead define groups this way:

groups = data_list.xpath("""//td[text()='groups']
                                 /following-sibling::*""")[0]

and then use text_content:

groups = groups.text_content().split()

However, splitting the text content on whitespace may not work well if group_a, group_b and/or group_c were replaced with text that itself contains whitespace.

Parsing results from html table

Answers (1)

Related Questions