Mik
Mik

Reputation: 13

Python merging files in directory

I have thousands of files inside a directory with this pattern YYYY/MM/DD/HH/MM:

I want to keep just the hours, so I need to merge 60 files into one for every hour of every day. I don't know how to search into the filename to get the 60 files that i want. This is what I wrote

def concat_files(path):
    file_list = os.listdir(path)
    with open(datetime.datetime.now(), "w") as outfile:
        for filename in sorted(file_list):
            with open(filename, "r") as infile:
                outfile.write(infile.read())

How do I name the file to keep the date? I'm using datetime now but it override the current filename. With my code I'm merging all files into one, I should merge every % 60 into a different file.

Upvotes: 0

Views: 6510

Answers (3)

Arthur Dent
Arthur Dent

Reputation: 1

Try this one.

file_list = os.listdir(path)
for f in { f[:-6] for f in file_list }:
    if not f:
        continue
    with open(f + '.txt', 'a') as outfile:
        for file in sorted([ s for s in file_list if s.startswith(f)]):
            with open(path + '/' + file, 'r') as infile:
                outfile.write(infile.read())
            #os.remove(path + '/' + file) # optional

Upvotes: 0

ChatterOne
ChatterOne

Reputation: 3541

You were not that far, you just need to swap your logic:

file_list = os.listdir(path)
for filename in sorted(file_list):
    out_filename = filename[:-6] + '.txt'
    with open(out_filename, 'a') as outfile:
        with open(path + '/' + filename, 'r') as infile:
            outfile.write(infile.read())

Upvotes: 1

James
James

Reputation: 36608

You can use glob to get just files you want. It lets you pass in a pattern to match against when searching for files. In the last line below, it will only find files that begin with '2018010100', have two characters, and end with '.txt'

from glob import glob

def concat_files(dir_path, file_pattern):
    file_list = glob(os.path.join(dir_path, file_pattern))
    with open(datetime.datetime.now(), "w") as outfile:
        for filename in sorted(file_list):
            with open(filename, "r") as infile:
                outfile.write(infile.read())

concat_files('C:/path/to/directory', '2018010100??.txt')

Upvotes: 1

Related Questions