J  Calbreath
J Calbreath

Reputation: 2705

Spark: Using named arguments to submit application

Is it possible to write a Spark script that has arguments that can referred to by name rather than index in the args() array? I have a script that has 4 required arguments and depending on the value of those, may require up to 3 additional arguments. For example, in one case args(5) might be a date I need to enter. I another, that date may end up in args(6) because of another argument I need.

Scalding has this implemented but I don;t see where Spark does.

Upvotes: 2

Views: 1897

Answers (2)

J  Calbreath
J Calbreath

Reputation: 2705

I actually overcame this pretty simply. You just need to preface each argument with a name and a delimiter say "--" when you call your application

spark-submit --class com.my.application --master yarn-client ./spark-myjar-assembly-1.0.jar input--hdfs:/path/to/myData output--hdfs:/write/to/yourData

Then include this line at the beginning of your code:

val namedArgs = args.map(x=>x.split("--")).map(y=>(y(0),y(1))).toMap

This converts the default args array into a Map called namedArgs (or whatever you want to call it. From there on, just refer to the Map and call all of your arguments by name.

Upvotes: 7

Jean Logeart
Jean Logeart

Reputation: 53809

Spark does not provide such functionality.

You can use Args from scalding (if you don't mind the dependency for such as small class):

val args = Args(argsArr.toIterable)

You can also use any CLI library that provides the parsing features you may want.

Upvotes: 0

Related Questions