Reputation: 415
The code below goes to a directory that has xml files, it takes them and parses them into a dataframe.
from xml.etree import ElementTree as ET
from collections import defaultdict
from pathlib import Path
import csv
from pathlib import Path
directory = 'C:/Users/xml_files'
with open('try.csv', 'w', newline='') as f:
writer = csv.writer(f, delimiter=';')
#◙ writer = csv.writer(f)
headers = ['identify','id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt','Counter', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
p_get = tree.find('.//Phones/Get').text
p_set = tree.find('.//Phones/Set').text
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
# <<<<< Indentation was wrong here
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['Rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['Qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['Description_txt'] = ds.text
row['Description_text_id'] = ds.attrib['text_id']
for counter, st in enumerate( sn.findall('.//SetData') ):
for k,v in st.attrib.items():
if v.startswith("-"):
v = v.replace("-","",1)
v=v.replace(',', '.')
row['SetData_'+ str(k)] = v
row["Counter"] = counter
row_data = [row[i] for i in headers]
row_data[0]=p_get + '_' + p_set
writer.writerow(row_data)
row = defaultdict(str)
Upon using more data, it is really hard to just wait there and not know how far the parsing into dataframe has been done.
So I went and tried to find a way I can show the progress bar. I ended up finding the following
import tqdm
import time
for i in tqdm.tqdm(range(1000)):
time.sleep(0.01)
# or other long operations
I am having problem implementing the code into my code and finding the range which preferably would be to get the numbers of Xml files in that directory
This library tqdm seemed like the easiest one to implement.
Upvotes: 0
Views: 2620
Reputation: 143062
You could use
for xml_file in tqdm.tqdm(xml_files_list):
it should automatically use len(xml_files_list)
and it will return xml_file
.
And you don't need sleep()
. It was used in documentation only to slow down loop for example.
Upvotes: 1