E199504
E199504

Reputation: 415

Progress bar while parsing files

The code below goes to a directory that has xml files, it takes them and parses them into a dataframe.

from xml.etree import ElementTree as ET
from collections import defaultdict
from pathlib import Path
import csv
from pathlib import Path


directory = 'C:/Users/xml_files'

with open('try.csv', 'w', newline='') as f:
    writer = csv.writer(f, delimiter=';')
   #◙ writer = csv.writer(f)

    headers = ['identify','id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt','Counter', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']

    writer.writerow(headers)

    xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        p_get = tree.find('.//Phones/Get').text
        p_set = tree.find('.//Phones/Set').text


        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)

            # <<<<< Indentation was wrong here
            for k,v in sn.attrib.items():
                row[k] = v
            for rn in sn.findall('.//Rational'):
                row['Rational'] = rn.text

            for qu in sn.findall('.//Qualify'):
                row['Qualify'] = qu.text

            for ds in sn.findall('.//Description'):
                row['Description_txt'] = ds.text
                row['Description_text_id'] = ds.attrib['text_id']



            for counter, st in enumerate( sn.findall('.//SetData') ):
                for k,v in st.attrib.items():
                    if v.startswith("-"):
                        v = v.replace("-","",1)
                    v=v.replace(',', '.')
                    row['SetData_'+ str(k)] = v
                row["Counter"] = counter 
                row_data = [row[i] for i in headers]
                row_data[0]=p_get + '_' + p_set
                writer.writerow(row_data)
                row = defaultdict(str)

Upon using more data, it is really hard to just wait there and not know how far the parsing into dataframe has been done.

So I went and tried to find a way I can show the progress bar. I ended up finding the following

import tqdm
import time

for i in tqdm.tqdm(range(1000)):
    time.sleep(0.01)
    # or other long operations

I am having problem implementing the code into my code and finding the range which preferably would be to get the numbers of Xml files in that directory

This library tqdm seemed like the easiest one to implement.

Upvotes: 0

Views: 2620

Answers (1)

furas
furas

Reputation: 143062

You could use

for xml_file in tqdm.tqdm(xml_files_list):

it should automatically use len(xml_files_list) and it will return xml_file.

And you don't need sleep(). It was used in documentation only to slow down loop for example.

Upvotes: 1

Related Questions