Max FH
Max FH

Reputation: 59

How to get new txt files as index from list?

#working code for one file
directory = r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output'
list1 = os.listdir(directory)
length = len(list1)
print(list1)
print(list1[0])

with open(r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output\1.xml") as infile, open(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\out\output.xml', 'w') as outfile:
    for line in infile:
        if not line.strip(): continue  # skip the empty line
        outfile.write(line)

dir:

['0.xml', '1.xml', '2.xml', '3.xml']

Desired output:

4 xml files w/o whitespaces.

How to do this for every file in the list1 list? I'm so stuck. Please help

Upvotes: 0

Views: 80

Answers (2)

Booboo
Booboo

Reputation: 44223

If you wanted to update the files in place (for which I would first recommend that you make backups in case you have some sort of failure in the middle of an update and end up with a file in an inconsistent state), then the following would, for example, remove empty lines from all files with filetype xml (you can change the pattern match to be whatever you want):

from pathlib import Path


for path in Path(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output').glob('*.xml'):
    with open(path, 'r+') as f:
        lines = f.readlines()
        f.seek(0, 0)
        for line in lines:
            if line.strip() != '':
                f.write(line)
        f.truncate()

If you want new, numbered output files, then:

from pathlib import Path




n = 0
for path in Path(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output').glob('*.xml'):
    with open(path, 'r') as infile, open(fr'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output\output{n}.xml'), 'w') as outfile:
        for line in infile:
            if line.strip() != '':
                outfile.write(line)
    n += 1

Upvotes: 1

shekhar chander
shekhar chander

Reputation: 618

here's the solution.

directory = r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output'
list1 = os.listdir(directory)
length = len(list1)
for i in range(length):
    data = open(directory+'//'+list1[i],'rb').read()
    out = open(f'{directory}\out\{list1[i]}','wb').write(data)

Upvotes: 0

Related Questions