Reputation: 1302
Is it possible to execute a Hadoop Streaming job that has no input file?
In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. Currently, I'm using a stub input file with a single line, I'd like to remove this requirement.
We have 2 use cases in mind.
1)
Upvotes: 3
Views: 1182
Reputation: 29317
No, it is not possible to execute a Hadoop Streaming job that has no input file.
The only two options that are required by mapred streaming
are -input
and -output
.
From the Hadoop Streaming documentation:
mapred streaming [genericOptions] [streamingOptions]
where streamin options are one or more of
-input <directoryname> or <filename>
Required (Input location for mapper)-output <directoryname>
Required (Output location for reducer)-mapper <executable> or <JavaClassName>
Optional (Mapper executable. If not specified, IdentityMapper is used as the default)-reducer <executable> or <JavaClassName>
Optional (Reducer executable. If not specified, IdentityReducer is used as the default)[ . . . ]
all other options are optionalSo this is how a very minimal allowed MapReduce streaming job:
mapred streaming \
-input my_input \
-output my_output
This job will just echo the contents of my_input
into my_output
whereas each line is converted into a <key>, <value>
pair separated by a tab.
Upvotes: 0
Reputation: 1210
According to the docs this is not possible. The following are required parameters for execution:
It looks like providing a dummy input file is the way to go currently.
Upvotes: 1