Reputation: 39
I want to do a hadoop job by mapping inputs which is from a file and a cassandra at a time. it it possible?
I know the ways to get file inputs files from a directory or input datas from a cassandra.
but, I am not sure to a way to get each input from them is possible.
here is more hints to describe my situation. data format is same.
a file like this: key value1 value2 value3 ...
a cassandra column structure like this: key column | column name1 | column name 2 | column name 3 key value | column value1| column vlaue2 | column value 3 ...
I need to extract a line from them and then do compare datas based on each key. yes, I can get duplicate keys or new keys or deleted keys.
thanks.
Upvotes: 0
Views: 144
Reputation: 1275
You can do this in two jobs. First make a map only job to pull in your Cassandra data to HDFS.
Then use the "MultipleInputs" class "addInputPath" to specify the two locations you want your data from http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html
Then in your map (of your second job) you can have logic based conditionality to what the input is based on the data you are seeing (like having the first column from cassandra say "cassandra" and recognize that in your map class of the second job) and clean it up (make it uniform) when it goes to the reducer.
Upvotes: 1