Kapernski
Kapernski

Reputation: 719

Parsing non-standard arguments

I am trying to create a program that replaces one or many characters from a source file with one or many specified characters and writes the new text into a destination file.

For example, the following calls to the program "sub" all result in replacing instances of 'a' and 'b' from src.txt with 'x' and 'y' respectively and writes it to dest.txt.

$ sub --ab -+xy -i src.txt -o dest.txt 
$ sub -i src.txt -o dest.txt --ab -+xy
$ sub -o dest.txt --ab -i src.txt -+xy
$ sub --ab -o dest.txt -+xy -i src.txt 

I have looked at C's getopt() but I don't think it covers possible multiple characters following options.

The way the program accepts arguments is fixed. How would I parse these arguments in which some cases may have multiple letters to replace in a text file? And handle any argument ordering?

I cannot switch on strings, but I cannot create enum with special characters featured in the options. As far as I know getopt() doesn't handle the way my program expects arguments. So I'm left with the following very incomplete code:

int main(int argc, char * argv[]) {

    // help message displayed for "sub" and "sub -h"
    if (argc == 1 || strcmp(argv[1], "-h") == 0){
        helpMsg();
    } else {
        // process rest of argv
        int i = 2;
        while (argc != 0) {

            char *opt = argv[i];

            switch(opt){

            }

            i++;
            argc--;
        }
    }

    return 0;
}

Upvotes: 1

Views: 138

Answers (2)

Dúthomhas
Dúthomhas

Reputation: 10028

Sometimes the simplest thing is just to initialize your options in a loop, testing for each case.

#include <iso646.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int help_quit( const char * message )
{
  printf( "%s\n", message );
  exit( 0 );
}

int main( int argc, char ** argv )
{
  FILE       * infile       = NULL;
  FILE       * outfile      = NULL;
  const char * in_filename  = NULL;
  const char * out_filename = NULL;
  const char * to_remove    = "";
  const char * to_add       = "";
  
  for (int n = 1;  n < argc;  n += 1)
  {
    // All arguments must begin with '-'
    if (argv[n][0] != '-') 
    {
      help_quit( "All arguments must begin with a dash" );
    }
    
    // -h, --help
    if ((strcmp( argv[n], "-h" ) == 0) or (strcmp( argv[n], "--help" ) == 0))
    {
      help_quit( "usage:\n  sub ..." );
    }
    
    // -i FILENAME
    if (strcmp( argv[n], "-i" ) == 0)
    {
      if (argc-n < 2) help_quit( "missing input filename" );
      in_filename = argv[++n];
      continue;
    }
    
    // -o FILENAME
    if (strcmp( argv[n], "-o" ) == 0)
    {
      if (argc-n < 2) help_quit( "missing output filename" );
      out_filename = argv[++n];
      continue;
    }
    
    if (argv[n][1] == '-')
    {
      to_remove = argv[n] + 2;
      continue;
    }
    
    if (argv[n][1] == '+')
    {
      to_add = argv[n] + 2;
      continue;
    }
    
    help_quit( "unknown option" );
  }
  
  // validate args
  if (strlen( to_remove ) != strlen( to_add )) help_quit( "number of remove chars != number of add chars" );
  if (!in_filename)  help_quit( "you must specify an input filename" );
  if (!out_filename) help_quit( "you must specify an output filename" );
  if (!(infile  = fopen( in_filename,  "r" ))) help_quit( "could not open input file" );
  if (!(outfile = fopen( out_filename, "w" ))) 
  {
    fclose( infile );
    help_quit( "could not open output file" );
  }
  
  // do stuff
  printf( "%s\n", "TO DO" );
  
  // clean up
  fclose( outfile );
  fclose( infile );
  return 0;
}

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753475

Although it is highly unorthodox, this code works on a Mac:

#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    int opt;
    while ((opt = getopt(argc, argv, "i:o:+:-:")) != -1)
    {
        switch (opt)
        {
        case '+':
        case '-':
        case 'i':
        case 'o':
            printf("Got '-%c' argument '%s'\n", opt, optarg);
            break;
        default:
            printf("!! FAIL !! optopt = %c\n", optopt);
            break;
        }
    }
    return 0;
}

It tells getopt() that - is an option 'letter' (character) that expects an argument, and that + is an option character that expects an argument. I can run it like this (code in getopt23.c compiled to getopt23):

$ ./getopt23 -i input -o output --ab -+xy
Got '-i' argument 'input'
Got '-o' argument 'output'
Got '--' argument 'ab'
Got '-+' argument 'xy'
$

Note that that the compulsory space between -i and the input file etc is not mandatory with this code:

$ ./getopt23 -iinput -ooutput -+ xy --ab
Got '-i' argument 'input'
Got '-o' argument 'output'
Got '-+' argument 'xy'
Got '--' argument 'ab'
$ ./getopt23 -iinput -ooutput -+ xy -- ab
Got '-i' argument 'input'
Got '-o' argument 'output'
Got '-+' argument 'xy'
$

The second of these two is interesting — the -- indicates the end of the options and the ab is a non-option argument (typically a file name). If the code was extended with a loop:

for (int i = optind; i < argc; i++)
    printf("Plain argument %d: '%s'\n", i, argv[i]);

then the ab (but not the --) would be printed as a 'plain argument'. (The POSIX Utility Syntax Guidelines use the name 'operand' for what I called 'non-option arguments' .)

If you write your own code, you can enforce the 'file name separate from option' and 'replacement string attached to -- or -+ option'. With regular getopt(), it is fiendishly difficult to do that without incurring undefined behaviour.


I previously commented that:

And you are correct that neither (POSIX) getopt() nor any standard variant will handle this notation — that's what 'non-standard' means in this context.

I have to partially withdraw that statement. If you must have the -i and the input file name in separate arguments and not presented as -iinput, and similarly with -o and the output file name, and if the character sets must be attached to -- and -+, then you cannot use getopt() reliably. If that notation can be flexible, you can use getopt() after all and my previous comment is an over-statement.

Upvotes: 2

Related Questions