Reputation: 51
I'm trying to run streaming job on cluster of DSE 3.1 analytics servers. I'm using Cassandra CFs for input. But it complains about input and output parameters, but they were set (I've set it just because of complaining):
dse hadoop jar $HADOOP_HOME/lib/hadoop-streaming-1.0.4.8.jar \
-D cassandra.input.keyspace="tmp_ks" \
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
-D cassandra.input.columnfamily="tmp_cf" \
-D cassandra.consistencylevel.read="ONE" \
-D cassandra.input.widerows=true \
-D cassandra.input.thrift.address=10.0.0.1
-inputformat org.apache.cassandra.hadoop.ColumnFamilyInputFormat \
-outputformat org.apache.hadoop.mapred.lib.NullOutputFormat \
-input /tmp_ks/tmp_cf \
-output /dev/null \
-mapper mymapper.py \
-reducer myreducer.py
Got "ERROR streaming.StreamJob: Missing required options: input, output". I've tried different inputs and outputs, different outputformats but got the same error.
What I've done wrong?
Upvotes: 2
Views: 731
Reputation: 1521
I also noticed this wrong with your command:
...
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
...
The class should be "Murmur3Partitioner"
Upvotes: 0
Reputation: 468
I notice that this part of your command doesn't have a trailing backslash:
...
-D cassandra.input.thrift.address=10.0.0.1
...
Maybe that's screwing up the lines that follow?
Upvotes: 2
Reputation: 4832
Input should be an existing path on HDFS, while output should be a non-existing path on HDFS
Upvotes: 1