Read in multiple folder and combine multiple text files contents to one file per folder - Python

I'm new to Python. I have 100's of multiple folders in the same Directory inside each folder i have multiple text files each. i want to combine all text files contents to one per folder.

Folder1
text1.txt
text2.txt
text3.txt
.
.

Folder2
text1.txt
text2.txt
text3.txt
.
.

i need output as copy all text files content in to one text1.txt + text2.txt + text3.txt ---> Folder1.txt

Folder1
text1.txt
text2.txt
text3.txt
Folder1.txt

Folder2
text1.txt
text2.txt
text3.txt
Folder2.txt  

i have below code which just list out the text files.

for path,subdirs, files in os.walk('./data')
    for filename in files:
        if filename.endswith('.txt'):

please help me how to proceed on the task. Thank you.

Upvotes: 1

Views: 2813

Answers (2)

Mohima Chaudhuri
Mohima Chaudhuri

Reputation: 109

Breaking down the problem we need the solution to:

  1. Find all files in a directory
  2. Merge contents of all the files into one file - with the same name as the name of the directory.

And then apply this solution to every sub directory in the base directory. Tested the code below.

Assumption: the subfolders have only text files and no directories

import os


# Function to merge all files in a folder
def merge_files(folder_path):
    # get all files in the folder,
    # assumption: folder has no directories and all text files
    files = os.listdir(folder_path)

    # form the file name for the new file to create
    new_file_name = os.path.basename(folder_path) + '.txt'
    new_file_path = os.path.join(folder_path, new_file_name)

    # open new file in write mode
    with open(new_file_path, 'w') as nf:
        # open files to merge in read mode
        for file in files:
            file = os.path.join(folder_path, file)
            with open(file, 'r') as f:
                # read all lines of a file and write into new file
                lines_in_file = f.readlines()
                nf.writelines(lines_in_file)
                # insert a newline after reading each file
                nf.write("\n")


# Call function from the main folder with the subfolders
folders = os.listdir("./test")
for folder in folders:
    if os.path.isdir(os.path.join('test', folder)):
        merge_files(os.path.join('test', folder))

Upvotes: 2

Luka Mesaric
Luka Mesaric

Reputation: 677

First you will need to get all folder names, which can be done with os.listdir(path_to_dir). Then you iterate over all of them, and for each you will need to iterate over all of its children using the same function, while concatenating contents using this: https://stackoverflow.com/a/13613375/13300960

Try writing it by yourself and update the answer with your code if you will need more help.

Edit: os.walk might not be the best solution since you know your folder structure and just two listdirs will do the job.

import os

basepath = '/path/to/directory' # maybe just '.'
for dir_name in os.listdir(basepath):
    dir_path = os.path.join(basepath, dir_name)
    if not os.path.isdir(dir_path):
        continue
    with open(os.path.join(dir_path, dir_name+'.txt') , 'w') as outfile:
        for file_name in os.listdir(dir_path):
            if not file_name.endswith('.txt'):
                continue
            file_path = os.path.join(dir_path, file_name)
            with open(file_path) as infile:
                for line in infile:
                    outfile.write(line)

This is not the best code, but it should get the job done and it is the shortest.

Upvotes: 1

Related Questions