Reputation: 1367
I am trying to extract the contents of specific tags in an XML file.
Sample XML:
<facts>
<fact>
<name>crash</name>
<full_name>Crash</full_name>
<variables>
<variable>
<name>id</name>
<proper_name>Crash Instance</proper_name>
<type>INT</type>
<interpretation>key</interpretation>
</variable>
<variable>
<name>accident_key</name>
<proper_name>Case Identifier</proper_name>
<interpretation>string</interpretation>
<type>CHAR(9)</type>
</variable>
<variable>
<name>accident_year</name>
<proper_name>Crash Year</proper_name>
<interpretation>dim</interpretation>
<type>INT</type>
</variable>
</variables>
</fact>
<fact>
<name>vehicle</name>
<full_name>Vehicle</full_name>
<variables>
<variable>
<name>id</name>
<proper_name>Vehicle Instance</proper_name>
<type>INT</type>
</variable>
<variable>
<name>crash_id</name>
<proper_name>Crash Instance</proper_name>
<type>INT</type>
</variable>
</variables>
</fact>
</facts>
I want to pull all of the contents of the tag from the nodes, but only in the Crash fact.
Here is my code so far.
def header(filename, fact):
lst = []
tree = ET.parse(filename) #read in the XML
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == fact: #choose the fact to pull from
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
return lst #return a list of all the <name> tags from the Crash fact
newlst = header('schema.xml','crash')
My output, newlst, should be a list of all the tags from the Crash facts. But it keeps returning empty.
Strangely, it returns the correct output if I hard-code everything (and remove the function):
lst = []
tree = ET.parse('schema.xml')
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == 'crash':
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
print(lst)
Output: ['id',
'accident_key',
'accident_year']
Upvotes: 4
Views: 12146
Reputation: 9244
In the function, you're using variable fact
both as a parameter, and as the first for
loop's variable. Try this version:
def header(filename, target_factname):
lst = []
tree = ET.parse(filename) #read in the XML
for fact in tree.iter(tag = 'fact'):
factname = fact.find('name').text
if factname == target_factname: #choose the fact to pull from
for var in fact.iter(tag = 'variable'):
name = var.find('name').text
lst.append(name)
return lst #return a list of all the <name> tags from the Crash fact
Upvotes: 5