user403579
user403579

Reputation: 51

Can not run hadoop streaming job: Missing required options: input, output

I'm trying to run streaming job on cluster of DSE 3.1 analytics servers. I'm using Cassandra CFs for input. But it complains about input and output parameters, but they were set (I've set it just because of complaining):

dse hadoop jar $HADOOP_HOME/lib/hadoop-streaming-1.0.4.8.jar \
-D cassandra.input.keyspace="tmp_ks" \
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
-D cassandra.input.columnfamily="tmp_cf" \
-D cassandra.consistencylevel.read="ONE" \
-D cassandra.input.widerows=true \
-D cassandra.input.thrift.address=10.0.0.1
-inputformat org.apache.cassandra.hadoop.ColumnFamilyInputFormat \
-outputformat org.apache.hadoop.mapred.lib.NullOutputFormat \
-input /tmp_ks/tmp_cf \
-output /dev/null \
-mapper mymapper.py \
-reducer myreducer.py

Got "ERROR streaming.StreamJob: Missing required options: input, output". I've tried different inputs and outputs, different outputformats but got the same error.

What I've done wrong?

Upvotes: 2

Views: 731

Answers (3)

Gunslinger
Gunslinger

Reputation: 1521

I also noticed this wrong with your command:

...    
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
...

The class should be "Murmur3Partitioner"

Upvotes: 0

Nonnib
Nonnib

Reputation: 468

I notice that this part of your command doesn't have a trailing backslash:

...
-D cassandra.input.thrift.address=10.0.0.1
...

Maybe that's screwing up the lines that follow?

Upvotes: 2

zhutoulala
zhutoulala

Reputation: 4832

Input should be an existing path on HDFS, while output should be a non-existing path on HDFS

Upvotes: 1

Related Questions