Reputation: 1173

Python lxml: Items without .text attribute returned when querying for nodes()

I am trying to parse out certain tags from an XML document and it is retiring an AttributeError: '_ElementStringResult' object has no attribute 'text' error.

Here is the xml document:

<?xml version='1.0' encoding='ASCII'?>
<Root>
  <Data>
    <FormType>Log</FormType>
    <Submitted>2012-03-19 07:34:07</Submitted>
    <ID>1234</ID>
    <LAST>SJTK4</LAST>
    <Latitude>36.7027777778</Latitude>
    <Longitude>-108.046111111</Longitude>
    <Speed>0.0</Speed>
  </Data>
</Root>

Here is the code I am using

from lxml import etree
from StringIO import StringIO
import MySQLdb
import glob
import os
import shutil
import logging
import sys

localPath = "C:\data"
xmlFiles = glob.glob1(localPath,"*.xml")
for file in xmlFiles:
    a = os.path.join(localPath,file)
    element = etree.parse(a)

    Data = element.xpath('//Root/Data/node()')
    parsedData = [{field.tag: field.text for field in Data} for action in Data]




print parsedData #AttributeError: '_ElementStringResult' object has no attribute 'text'

Upvotes: 0

Answers (2)

Charles Duffy

Reputation: 295272

Instead of querying for //Root/Data/node(), query for /Root/Data/* if you want only elements (as opposed to text nodes) to be returned. (Also, using only a single leading / rather than // allows the engine to do a cheaper search, rather than needing to look through the whole subtree for an additional Root.

Also -- are you sure you really want to loop through the entire list of subelements of Data inside your inner loop, rather than looping over only the subelements of a single Data element selected by your outer loop? I think your logic is broken, though it would only be visible if you had a file with more than one Data element under Root.

Upvotes: 2

Lance Helsten

Reputation: 9997

'//Root/Data/node()' will return a list of all the child elements which include text elements as strings which will not have a text attribute. If you put a print right after the Data = ... you will see something like ['\n ', <Element FormType at 0x10675fdc0>, '\n ', ....

I would do a filter first such as:

Data = [f for f in elem.xpath('//Root/Data/node()') if hasattr(f, 'text')]

Then I think the following line could be rewritten as:

parsedData = {field.tag: field.text for field in Data}

which will give the element tag and text dictionary which I believe is what you want.

Upvotes: 2

Python lxml: Items without .text attribute returned when querying for nodes()

Answers (2)

Related Questions