idclark
idclark

Reputation: 958

Collect values of child tag using Python lxml

I'm using the lxml library with Python 2.6 to extract data from an xml file. Within the document I have many <Employee> tags. I iterate over each <Employee> tag, create a new instance of my Employee class and set its member variables with the values of the Employee tag.

    read_CA_tree = etree.parse(xml_tree, parser)
    all_employees = []
    for employee_tag in read_CA_tree.iter("Employee"):
        employee = Employee(employee_tag)
        all_employees.append(employee)

The <Employee> tag may also have one or more <EmailAddress> child tags like so:

<Employee ID="124" Name="Foo Bar" Title="Baz">
   <EmailAddress ID="124" Address="[email protected]" />
 </Employee>

My Employee object is instantiated via lxml's Element calls get() method

class Employee(object):

    def __init__(self, employee_tag):
        self.Employee_ID = employee_tag.get("EmployeeID")
        self.First_Name = employee_tag.get("FirstName")
        self.Email_Addresses = self._collect_email(read_CA_tree, "EmailAddress")

    def _collect_emails(self,tree,tag):
        known_addr = []
        for i in tree.iter(tag):
            known_addr.append(i)
        return known_addr

For each Employee tag, how can I collect the value(s) of Address within the child <EmailAddress> tag and add a list of email addresses to my Employee class constructor?

Upvotes: 0

Views: 230

Answers (1)

David Zemens
David Zemens

Reputation: 53623

From the dox:

Elements carry attributes as a dict

So, you can try:

def _collect_emails(self,tree,tag):
    known_addr = []
    email_addr = []
    for i in tree.iter(tag):
        known_addr.append(i)
        email_addr.append(i.get('Address', '')
    return known_addr

Upvotes: 2

Related Questions