Reputation: 59
#working code for one file
directory = r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output'
list1 = os.listdir(directory)
length = len(list1)
print(list1)
print(list1[0])
with open(r"C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output\1.xml") as infile, open(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\out\output.xml', 'w') as outfile:
for line in infile:
if not line.strip(): continue # skip the empty line
outfile.write(line)
dir:
['0.xml', '1.xml', '2.xml', '3.xml']
Desired output:
4 xml files w/o whitespaces.
How to do this for every file in the list1 list? I'm so stuck. Please help
Upvotes: 0
Views: 80
Reputation: 44223
If you wanted to update the files in place (for which I would first recommend that you make backups in case you have some sort of failure in the middle of an update and end up with a file in an inconsistent state), then the following would, for example, remove empty lines from all files with filetype xml
(you can change the pattern match to be whatever you want):
from pathlib import Path
for path in Path(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output').glob('*.xml'):
with open(path, 'r+') as f:
lines = f.readlines()
f.seek(0, 0)
for line in lines:
if line.strip() != '':
f.write(line)
f.truncate()
If you want new, numbered output files, then:
from pathlib import Path
n = 0
for path in Path(r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output').glob('*.xml'):
with open(path, 'r') as infile, open(fr'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output\output{n}.xml'), 'w') as outfile:
for line in infile:
if line.strip() != '':
outfile.write(line)
n += 1
Upvotes: 1
Reputation: 618
here's the solution.
directory = r'C:\Users\Max12\Desktop\xml\pdfminer\UiPath\output'
list1 = os.listdir(directory)
length = len(list1)
for i in range(length):
data = open(directory+'//'+list1[i],'rb').read()
out = open(f'{directory}\out\{list1[i]}','wb').write(data)
Upvotes: 0