Reputation: 129
I have this xml file:
<SESSION_INFO>
<start_time>2018-10-16 22:44:38.36 -0500</start_time>
</SESSION_INFO>
<ALL_INSTANCES>
<instance>
<ID>1</ID>
<start>4.3974745990</start>
<end>13.6332131403</end>
<code>Button 013</code>
<label>
<text>1,2</text>
</label>
<label>
<text>0,4</text>
</label>
<label>
<text>2,3</text>
</label>
</instance>
<instance>
<ID>2</ID>
<start>513.0491021980</start>
<end>524.9834182373</end>
<code>Button 013</code>
<label>
<text>1,2</text>
</label>
<label>
<text>1,4</text>
</label>
<label>
<text>1,3</text>
</label>
<label>
<text>0,1</text>
</label>
<label>
<text>1,3</text>
</label>
<label>
<text>0,4</text>
</label>
</instance>
</ALL_INSTANCES>
I wrote a code to extract all the data from /label/text and put it in a list:
import xml.etree.ElementTree as ET
tree= ET.parse('/Desktop/XML Edit list.xml')
root = tree.getroot()
labels = []
for each in root.findall('.//ALL_INSTANCES/instance/label'):
rating = each.find('.//text');
print 'Empity' if rating is None else labels.append(rating.text);
print(labels)
Next step, where I can't get my head around it, is to create a list for all the in each instance (2 in this example). Now, I feel like I need to use a for loop to go into each , pull out the data and write into a list that will be appended to labels[]. However, I cannot go through each instance separately; the .find and .get loop did not get me any far... and it was my best shot.
Thank you in advance for your help, Cronos
EDIT 1 Adding ideal output as per request:
[['1,2', '0,4', '2,3'], ['1,2', '1,4', '1,3', '0,1', '1,3', '0,4']]
EDIT 2 Before, I have achieved this adding another list inside the loop that will first append to all_lables and then it resets in order to get the other values for the next instance. Something like:
all_labels = []
result = []
for child in root.iter():
for instance in child.findall('instance'):
for label in instance.findall('label'):
all_labels = []
for val in label.findall('text'):
all_labels.append(val.text)
result.append(all_labels)
But I canont make it work
EDIT 3 Almost got it, thanks to LeKhan9 who showed a simpler approach; based on his idea, I created another list that will save the result of each loop; the output contains an empty value so it is not "clean":
all_labels = []
result = []
for child in root.iter():
for instance in child.findall('instance'):
result.append(all_labels)
all_labels = []
for label in instance.findall('label'):
for val in label.findall('text'):
all_labels.append(val.text)
result.append(all_labels)
print result
[[], ['1,2', '0,4', '2,3'], ['1,2', '1,4', '1,3', '0,1', '1,3', '0,4']]
Upvotes: 0
Views: 531
Reputation: 1350
You can always take a deliberate approach and parse each level of the tree as such:
from xml.etree import ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
all_labels = []
for child in root.iter():
for instance in child.findall('instance'):
for label in instance.findall('label'):
for val in label.findall('text'):
all_labels.append(val.text)
print all_labels
output:
['1,2', '0,4', '2,3', '1,2', '1,4', '1,3', '0,1', '1,3', '0,4']
Updating based on OPs expected output:
from xml.etree import ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
result = []
for child in root.iter():
for instance in child.findall('instance'):
current_labels = []
for label in instance.findall('label'):
for val in label.findall('text'):
current_labels.append(val.text)
result.append(current_labels)
print result
Output:
[['1,2', '0,4', '2,3'], ['1,2', '1,4', '1,3', '0,1', '1,3', '0,4']]
Upvotes: 1