Massive_Shed
Massive_Shed

Reputation: 47

Concatenating a list of files with subprocess and wildcards in python

I'm trying to concatenate multiple files in a directory to a single file. So far I've been trying to use cat with subprocess with poor results.

My original code was:

source = ['folder01/*', 'folder02/*']
target = ['/output/File1', '/output/File2']

for f1, f2, in zip(source, target):
    subprocess.call(['cat', f1, '>>', f2])

I've tried handing it shell=True:

..., f2], shell=True)

And in conjunction with subprocess.Popen instead of call in a number of permutations, but with no joy.

As I've understood from other similar questions, with shell=True the command will need to be provided as a string. How can I go about calling cat on all items in my list whilst executing as a string?

Upvotes: 0

Views: 1247

Answers (1)

Arount
Arount

Reputation: 10403

You don't need subprocess here and you must always avoid subprocess when you can (that means: 99.99% of time).

As Joel pointed out in comments, maybe I should take a few minutes and bullet points to explain you why:

  1. Using subprocess (or similar) assume your code will always run on the exact same environment, that means same OS, version, shell, tools installed, etc.. This is really not fitted for a production grade code.
  2. These kind of libraries will prevent you to make "pythonic Python code", you will have to handle errors by parsing string instead of try / except, etc..
  3. Tim Peters wrote the Zen of Python and I encourage you to follow it, at least 3 points are relevant here: "Beautiful is better than ugly.", "Readability counts." and "Simple is better than complex.".

In other words: subprocess will only make your code less robust, force you to handle non-Python issues, force you to perform tricky computing where you could just write clean and powerful Python code.

There are way more good reasons to not use subprocess, but I think you got the point.


Just open files with open, here is a basic example you will need to adapt:

import os

for filename in os.listdir('./'):
    with open(filename, 'r') as fileh:
        with open('output.txt', 'a') as outputh:
            outputh.write(fileh.read())

Implementation example for your specific needs:

import os

sources = ['/tmp/folder01/', '/tmp/folder02/']
targets = ['/tmp/output/File1', '/tmp/output/File2']

# Loop in `sources`
for index, directory in enumerate(sources):
    # Match `sources` file with expected `targets` directory
    output_file = targets[index]
    # Loop in files within `directory`
    for filename in os.listdir(directory):
        # Compute absolute path of file
        filepath = os.path.join(directory, filename)
        # Open source file in read mode
        with open(filepath, 'r') as fileh:
            # Open output file in append mode
            with open(output_file, 'a') as outputh:
                # Write content into output
                outputh.write(fileh.read())

Be careful, I changed your source and target values (/tmp/)

Upvotes: 1

Related Questions