xyiong
xyiong

Reputation: 373

Python lxml error when iterating through xml file

I have an xml file like this:

<location type="journal">
???INSERT location???
    <journal title="J. Gen. Virol.">
        <volumn> 84 </volumn>
        <page start="2305" end="2315"/>
        <year> 2003 </year>
    </journal>
</location>

I am iterating through the file like so:

tree_out = etree.parse(xmlfile.xml)
updatedtext_head = '???UPDATE FROM '
insert_head = '???INSERT '
delete_head = '???DELETE '

updatedattrib_head = '???UPDATE '
updatedattrib_mid = ' FROM '
mark_end = '???'

every = 60

G = nx.DiGraph()


color_list=[]


node_text=[]


inserted_out=[]


deleted_out=[]


updatedtext_out=[]


others_out=[]


updatedattrib_out=[]


old_new_attrib_pairs=[]


full_texts=[]

for x in tree_out.iter():
        
    for y in x.iterancestors():
        if '???DELETE' in y.text and x not in deleted_out:
            deleted_out.append(x)

    if '???DELETE' in x.text and x not in deleted_out:
            deleted_out.append(x)

    for y in x.iterancestors():
        if '???INSERT' in y.text and x not in inserted_out:
            inserted_out.append(x)

    if '???INSERT' in x.text and x not in inserted_out:
            inserted_out.append(x)

    if '???UPDATE FROM' in x.text and x not in updatedtext_out:
            updatedtext_out.append(x)

    if  '???UPDATE ' in x.text and ' FROM ' in x.text and '???' in x.text and x not in updatedattrib_out and x not in updatedtext_out:
            updatedattrib_out.append(x)

    if (re.search(r'^\s+$', x.text)) and x not in others_out and x not in deleted_out and x not in inserted_out and x not in updatedtext_out and x not in updatedattrib_out:
        others_out.append(x)

but when I encounter elements such as this:

<page start="2305" end="2315"/>

I get thrown this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-b66a7d063b5b> in <module>
    121             deleted_out.append(x)
    122 
--> 123     if '???DELETE' in x.text and x not in deleted_out:
    124             deleted_out.append(x)
    125 

TypeError: argument of type 'NoneType' is not iterable

The intended end result is that I want the elements in the list to be sorted into separate lists as I have done in the code segment above. Why does this error occur and how can I fix it?

Upvotes: 1

Views: 127

Answers (1)

Michael Ruth
Michael Ruth

Reputation: 3514

Edit

The TypeError is caused by the attribute-only element. Specifically, the element is represented by the variable x and the code tests whether '???DELETE' occurs within x.text, but x.text is None because the text attribute is where the element's content is stored. For reference, XML elements have the following structure:

<element-name attribute1 attribute2>content</element-name>

The error contains the message argument of type 'NoneType' is not iterable because in has the syntax value in iterable. Specifically, x.text must be an iterable.

You should test that x.test isn't None before trying to use it like a str.

if x.text is not None and '???DELETE' in x.text and x not in deleted_out:
    deleted_out.append(x)

Original

You never declared deleted_out. Try this:

tree_out = etree.parse(xmlfile.xml)
deleted_out = []

for x in tree_out.iter():
        
    for y in x.iterancestors():
        if '???DELETE' in y.text and x not in deleted_out:
            deleted_out.append(x)

Upvotes: 1

Related Questions