Jun Young Kim
Jun Young Kim

Reputation: 39

How we can do a map operation from a file and a cassandra at a time?

I want to do a hadoop job by mapping inputs which is from a file and a cassandra at a time. it it possible?

I know the ways to get file inputs files from a directory or input datas from a cassandra.

but, I am not sure to a way to get each input from them is possible.

here is more hints to describe my situation. data format is same.

a file like this: key value1 value2 value3 ...

a cassandra column structure like this: key column | column name1 | column name 2 | column name 3 key value | column value1| column vlaue2 | column value 3 ...

I need to extract a line from them and then do compare datas based on each key. yes, I can get duplicate keys or new keys or deleted keys.

thanks.

Upvotes: 0

Views: 144

Answers (1)

Joe Stein
Joe Stein

Reputation: 1275

You can do this in two jobs. First make a map only job to pull in your Cassandra data to HDFS.

Then use the "MultipleInputs" class "addInputPath" to specify the two locations you want your data from http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/lib/MultipleInputs.html

Then in your map (of your second job) you can have logic based conditionality to what the input is based on the data you are seeing (like having the first column from cassandra say "cassandra" and recognize that in your map class of the second job) and clean it up (make it uniform) when it goes to the reducer.

Upvotes: 1

Related Questions