Regina
Regina

Reputation: 57

How to use python ArgumentParser on databricks?

I would like to run my python code by passing some arguments using ArgumentParser, the parser code is like:

def parse_args(argv):
    global Settings, COST_PP, COST_BP, COST_NP, COST_PN, COST_BN, COST_NN

    desc = "..."
    parser = argparse.ArgumentParser(description=desc)

   parser.add_argument("infile", action="store")
   parser.add_argument("-o", "--outfile", action="store", dest="outfile")
   args = parser.parse_args(argv)


def main():
    global Settings

    parse_args(sys.argv[1:])

    print("\t".join(sys.argv[1:]))
    logging.info("SETTINGS:")
    for k, v in Settings.items():
       logging.info("\t\t" + str(k) + ":\t" + str(v)) ...

if __name__ == '__main__':
    main()

But I got an error like this:

usage: PythonShell.py [-h] [-o OUTFILE] [-alg ALGORITHM] [-cL CLASS_LIST]
                  [-n RUNS] [-tf TRAIN_FRAC] [-cs COST_SET]
                  [-ms MULT_STRAT] [--log LOG_FILE] [-d]
                  infile
PythonShell.py: error: unrecognized arguments: 0 50000 1916 
a91f477cb4de44dfa5d1f3dd01f8f606 2.2.0
To exit: use 'exit', 'quit', or Ctrl-D.

I wonder how to run the code correctly in the databricks notebook? Appreciate any help!

Upvotes: 1

Views: 4752

Answers (2)

user12916762
user12916762

Reputation: 11

The only way I found to pass Parameters to Databricks Python script/job from Azure Data Factory was to use shlex:

    import argparse, shlex
    parser = argparse.ArgumentParser()
    parser.add_argument('--arg1', type=str, required=True)
    parser.add_argument('--arg2', type=str, required=True)
    args = parser.parse_args(shlex.split(" ".join(sys.argv[1:])))

Arguments are passed by Databicks as: ["script", "--arg1 val", "--arg1 val"] while argparse expects argument names and values splitted: ["script", "--arg1", "val", "--arg1", "val"].

We first concatenate all arguments as one string and then split it with shlex to get the correct list to pass to the parse_args.

Upvotes: 1

Erik563
Erik563

Reputation: 1

I had a similair issue. Even when I tried to run the most basic example of the documentation. But you can run argument parser in databricks in two ways:

  1. Pass the arguments in an array:

    import argparse
    parser = argparse.ArgumentParser(description='Process some integers.')
    parser.add_argument('integers', metavar='N', type=int, nargs='+',
                        help='an integer for the accumulator')
    parser.add_argument('--sum', dest='accumulate', action='store_const',
                        const=sum, default=max,
                        help='sum the integers (default: find the max)')
    ## normally you can now do: parser.parse_args(), instead do this:
    parser.parse_args(['23', '35'])
    
  2. Place the code in a seperate notebook and run that notebook from you current notebook with the %run command. Explained here

Upvotes: 0

Related Questions