Reputation: 13
I have thousands of files inside a directory with this pattern YYYY/MM/DD/HH/MM:
I want to keep just the hours, so I need to merge 60 files into one for every hour of every day. I don't know how to search into the filename to get the 60 files that i want. This is what I wrote
def concat_files(path):
file_list = os.listdir(path)
with open(datetime.datetime.now(), "w") as outfile:
for filename in sorted(file_list):
with open(filename, "r") as infile:
outfile.write(infile.read())
How do I name the file to keep the date? I'm using datetime now but it override the current filename. With my code I'm merging all files into one, I should merge every % 60 into a different file.
Upvotes: 0
Views: 6510
Reputation: 1
Try this one.
file_list = os.listdir(path)
for f in { f[:-6] for f in file_list }:
if not f:
continue
with open(f + '.txt', 'a') as outfile:
for file in sorted([ s for s in file_list if s.startswith(f)]):
with open(path + '/' + file, 'r') as infile:
outfile.write(infile.read())
#os.remove(path + '/' + file) # optional
Upvotes: 0
Reputation: 3541
You were not that far, you just need to swap your logic:
file_list = os.listdir(path)
for filename in sorted(file_list):
out_filename = filename[:-6] + '.txt'
with open(out_filename, 'a') as outfile:
with open(path + '/' + filename, 'r') as infile:
outfile.write(infile.read())
Upvotes: 1
Reputation: 36608
You can use glob
to get just files you want. It lets you pass in a pattern to match against when searching for files. In the last line below, it will only find files that begin with '2018010100'
, have two characters, and end with '.txt'
from glob import glob
def concat_files(dir_path, file_pattern):
file_list = glob(os.path.join(dir_path, file_pattern))
with open(datetime.datetime.now(), "w") as outfile:
for filename in sorted(file_list):
with open(filename, "r") as infile:
outfile.write(infile.read())
concat_files('C:/path/to/directory', '2018010100??.txt')
Upvotes: 1