Reputation: 2705
Is it possible to write a Spark script that has arguments that can referred to by name rather than index in the args() array? I have a script that has 4 required arguments and depending on the value of those, may require up to 3 additional arguments. For example, in one case args(5) might be a date I need to enter. I another, that date may end up in args(6) because of another argument I need.
Scalding has this implemented but I don;t see where Spark does.
Upvotes: 2
Views: 1897
Reputation: 2705
I actually overcame this pretty simply. You just need to preface each argument with a name and a delimiter say "--" when you call your application
spark-submit --class com.my.application --master yarn-client ./spark-myjar-assembly-1.0.jar input--hdfs:/path/to/myData output--hdfs:/write/to/yourData
Then include this line at the beginning of your code:
val namedArgs = args.map(x=>x.split("--")).map(y=>(y(0),y(1))).toMap
This converts the default args array into a Map called namedArgs (or whatever you want to call it. From there on, just refer to the Map and call all of your arguments by name.
Upvotes: 7
Reputation: 53809
Spark does not provide such functionality.
You can use Args
from scalding (if you don't mind the dependency for such as small class):
val args = Args(argsArr.toIterable)
You can also use any CLI library that provides the parsing features you may want.
Upvotes: 0