Ozdanny
Ozdanny

Reputation: 73

Using command-line parameters within Jupyter notebook

I'm running a python script within Jupyter notebook and attempting to loop it over a list. However the command line variables are not being recognised as variables, rather being taken as strings. This question and this question seem to be similar to what I want but I have no experience with using argparse so do not know where to start.

My code:

import got
retailers = ["handle1", "handle2"]

for retailer in retailers:
    string = "keyword " + "@"+ retailer
    file_name = "keyword_" + retailer
    %run Exporter.py --querysearch string --since 2018-01-01 --maxtweets 50 --output file_name

What it looks like when running from command line:

python Exporter.py --querysearch "keyword @retailer" --since 2018-01-01 --maxtweets 50 --output "keyword_retailer"

The problem is that the script Exporter.py is searching for the term "retailer" and not actually what I want, which is "keyword @Retailer". Same for the output file, which is being saved as "file_name" and not "keyword_retailer".

Any ideas on how I can solve this?

For context if it is needed, I am using this package.

EDIT:

I have added this to my code however I get the error listed below. I've also attached the module Exporter.py as I can't seem to fix this error.

import argparse
import sys
import Exporter 
def main(args):
    # parse arguments using optparse or argparse or what have you
    parser = argparse.ArgumentParser(description="Do something.")
    parser.add_argument("--querysearch", type=str, default= 2, required=True)
    parser.add_argument("--maxtweets", type=int, default= 4, required=True)
    parser.add_argument("--output", type=str, default= 4, required=True)
    parser.add_argument("--since", type=int, default= 4, required=True)

if __name__ == '__main__':
    import sys
    main(sys.argv[1:])

for retailer in retailers:
    string = "palm oil " + "@"+ retailer
    file_name = "palm_oil_" + retailer
    #print string
    #print file_name
    Exporter.main([string,"2018-01-01", 50, file_name])

Error message:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-35-4731f5aa548f> in <module>()
      4     #print string
      5     #print file_name
----> 6     Exporter.main([string,"2018-01-01", "50", file_name])

/Users/jamesozden/GetOldTweets-python-master/Exporter.pyc in main(argv)
     70                 got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer)
     71 
---> 72 
     73         finally:
     74                 outputFile.close()

UnboundLocalError: local variable 'arg' referenced before assignment

I have also tried this style solution with {}'s to denote variables not strings, as per the answer to another question but no success:

!python training.py --cuda --emsize 1500 --nhid 1500 --dropout {d} --epochs {e} 

Upvotes: 2

Views: 5485

Answers (1)

hpaulj
hpaulj

Reputation: 231698

Your description is unclear as to when it's running a script from shell, and when from a ipython (or notebook) using %run, so I'll focus on the argparse problems:

First this needs a parse_args:

def main(argv):
    # parse arguments using optparse or argparse or what have you
    parser = argparse.ArgumentParser(description="Do something.")
    parser.add_argument("--querysearch", type=str, default= 2, required=True)
    parser.add_argument("--maxtweets", type=int, default= 4, required=True)
    parser.add_argument("--output", type=str, default= 4, required=True)
    parser.add_argument("--since", type=int, default= 4, required=True)

    args = parse_args(argv)
    print(args)     # a good debugging step
    return args   # or do something with them

The argv parameter will need to look like something it would get via sys.argv[1:]

a list like:

['--querysearch', "keyword @retailer", 'since', '2018-01-01', ...]

I was going to use split() on

'--querysearch "keyword @retailer" --since 2018-01-01 --maxtweets 50 --output "keyword_retailer"'

but it won't handle the embedded space after 'keyword'. (lexsplit can).

If you make all those arguments required there's no point in providing default parameters. Conversely, provide the default and drop the required. And defaults like 4 for arguments with type=str are not a good idea. They work, but could mess up further processing (args.output a string or a number?).

Another way to 'bypass' the parser is to define a Namespace object:

 args = argparse.Namespace(querysearch='foo', maxtweets=4, output='afile', since=4)

Upvotes: 1

Related Questions