lynx
lynx

Reputation: 49

Parsing cmd args like typical filter programs

I spent few hours reading tutorials about argparse and managed to learn to use normal parameters. The official documentation is not very readable to me. I'm new to Python. I'm trying to write a program that could be invoked in following ways:

cat inFile | program [options] > outFile -- If no inFile or outfile is specified, read from stdin and output to stdout.

program [options] inFile outFile

program [options] inFile > outFile -- If only one file is specified it is input and output should go to stdout.

cat inFile | program [options] - outFile -- If '-' is given in place of inFlie read from stdin.

program [options] /path/to/folder outFile -- Process all files from /path/to/folder and it subdirectories.

I want it to behave like regular cli program under GNU/Linux.

It would be also nice if the program would be able to be invoked:

program [options] inFile0 inFile1 ... inFileN outFile -- first path/file always interpreted as input, last one always interpreted as output. Any additional ones interpreted as inputs.

I could probably write dirty code that would accomplish this but this is going to be used, so someone will end up maintaining it (and he will know where I live...).

Any help/suggestions are much appreciated.


Combining answers and some more knowledge from the Internet I've managed to write this(it does not accept multiple inputs but this is enough):

import sys, argparse, os.path, glob

def inputFile(path):
    if path == "-":
        return [sys.stdin]
    elif os.path.exists(path):
        if os.path.isfile(path):
            return [path]
        else:
            return [y for x in os.walk(path) for y in glob.glob(os.path.join(x[0], '*.dat'))]
    else:
        exit(2)

def main(argv):
    cmdArgsParser = argparse.ArgumentParser()
    cmdArgsParser.add_argument('inFile', nargs='?', default='-', type=inputFile)
    cmdArgsParser.add_argument('outFile', nargs='?', default='-', type=argparse.FileType('w'))
    cmdArgs = cmdArgsParser.parse_args()

    print cmdArgs.inFile
    print cmdArgs.outFile

if __name__ == "__main__":
   main(sys.argv[1:])

Thank you!

Upvotes: 4

Views: 765

Answers (2)

hpaulj
hpaulj

Reputation: 231665

I'll give you a start script to play with. It uses optionals rather than positionals. and only one input file. But it should give a taste of what you can do.

import argparse

parser = argparse.ArgumentParser()
inarg = parser.add_argument('-i','--infile', type=argparse.FileType('r'), default='-')
outarg = parser.add_argument('-o','--outfile', type=argparse.FileType('w'), default='-')

args = parser.parse_args()

print(args)
cnt = 0
for line in args.infile:
    print(cnt, line)
    args.outfile.write(line)
    cnt += 1

When called without arguments, it just echos your input (after ^D). I'm a little bothered that it doesn't exit until I issue another ^D.

FileType is convenient, but has the major fault - it opens the files, but you have to close them yourself, or let Python do so when exiting. There's also the complication that you don't want to close stdin/out.

The best argparse questions include a basic script, and specific questions on how to correct or improve it. Your specs are reasonably clear. but it would be nice if you gave us more to work with.


To handle the subdirectories option, I would skip the FileType bit. Use argparse to get 2 lists of strings (or a list and an name), and then do the necessary chgdir and or glob to find and iterate over files. Don't expect argparse to do the actual work. Use it to parse the commandline strings. Here a sketch of such a script, leaving most details for you to fill in.

import argparse
import os
import sys # of stdin/out
....
def open_output(outfile):
   # function to open a file for writing
   # should handle '-'
   # return a file object

def glob_dir(adir):
    # function to glob a dir
    # return a list of files ready to open

def open_forread(afilename):
    # function to open file for reading
    # be sensitive to '-'

def walkdirs(alist):
    outlist = []
    for name in alist:
        if <name is file>;
            outlist.append(name)
        else <name is a dir>:
            glist = glob(dir)
            outlist.extend(glist)
        else:
            <error>
    return outlist

def cat(infile, outfile):
    <do your thing here>

def main(args):
    # handle args options
    filelist = walkdirs(args.inlist)
    fout = open_outdir(args.outfile)
    for name in filelist:
        fin = open_forread(name)
        cat(fin,fout)
        if <fin not stdin>: fin.close()
    if <fout not stdout>: fout.close()

if '__name__' == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('inlist', nargs='*')
    parser.add_argument('outfile')
    # add options 
    args = parser.parse_args()
    main(args)

The parser here requires you to give it an outfile name, even if it is '-'. I could define its nargs='?' to make it optional. But that does not play nicely with the 'inlist` '*'.

Consider

myprog one two three

Is that

namespace(inlist=['one','two','three'], outfile=default)

or

namespace(inlist=['one','two'], outfile='three')

With both a * and ? positional, the identity of the last string is ambiguous - is it the last entry for inlist, or the optional entry for outfile? argparse chooses the former, and never assigns the value to outfile.

With --infile, --outfile definitions, the allocation of these strings is clear.

In sense this problem is too complex for argparse - there's nothing in it to handle things like directories. In another sense it is too simple. You could just as easily split sys.argv[1:] between inlist and outfile without the help of argparse.

Upvotes: 0

o11c
o11c

Reputation: 16146

You need a positional argument (name not starting with a dash), optional arguments (nargs='?'), a default argument (default='-'). Additionally, argparse.FileType is a convenience factory to return sys.stdin or sys.stdout if - is passed (depending on the mode).

All together:

#!/usr/bin/env python

import argparse

# default argument is sys.argv[0]
parser = argparse.ArgumentParser('foo')
parser.add_argument('in_file', nargs='?', default='-', type=argparse.FileType('r'))
parser.add_argument('out_file', nargs='?', default='-', type=argparse.FileType('w'))

def main():
    # default argument is is sys.argv[1:]
    args = parser.parse_args(['bar', 'baz'])
    print(args)
    args = parser.parse_args(['bar', '-'])
    print(args)
    args = parser.parse_args(['bar'])
    print(args)
    args = parser.parse_args(['-', 'baz'])
    print(args)
    args = parser.parse_args(['-', '-'])
    print(args)
    args = parser.parse_args(['-'])
    print(args)
    args = parser.parse_args([])
    print(args)

if __name__ == '__main__':
    main()

Upvotes: 2

Related Questions