Aniruddha Sinha
Aniruddha Sinha

Reputation: 799

Joins using Custom File Format

If I wish to perform a Reduce Side Join using Custom File Format, how may shall I implement the same talking about the RecordReader
Say I have to fetch Data from two datasets
One from customers table(customerid,fname,lname,age,profession)
One from Transactions table(transId,transdate,customerId,itemPurchased1,itemPurchased2,city,state,methodOfPayment)


In order to fetch data from two datasets, I need two mappers.Can I have two record readers for two mappers? If so how?
Please explain along with the driver implementation. If not possible please suggest me a way to implement reduce side join using custom file format.

Thank you in Advance :)

Upvotes: 0

Views: 42

Answers (1)

Ramzy
Ramzy

Reputation: 7138

You want to join two data sets with Reducer join.

You need two mappers as both have different data and need separate parsing. While writing output, you should output join attribute(may be cust id in your case) as key and entire record as value from each mapper. You can also filter unnecessary fields here to optimize. Important thing is, you need to append a string like ("set1:"+map value), to indentify in reduce from which mapper did the record come from.

In reducer, you will have cust Id as key, then the list contains both records from different sets, and you can join them there as your requirement.

So once two mappers are written, you should let the job know about them. This is mentioned in Job class using MultipleInputs as below

MultipleInputs.addInputPath(job, new Path("inputPath1"), TextInputFormat.class, com.abc.HBaseMapper1.class);
MultipleInputs.addInputPath(job, new Path("inputPath2"), TextInputFormat.class, com.abc.HBaseMapper2.class);

From performance point, if one of the table is small, you can use distributed cache to load that file and then send the other data set accordingly.

In Mapper 1, Get cust id from the row:

context.write(new Text("custId"),new Text("@@map1@@|"+value));

In Mapper 2,

context.write(new Text("custId"),new Text("@@map2@@|"+value));

In reducer,

for(Text txt:values)
{
 String output;
 if(txt contains "map1"){
 //Append your output string
} else if(txt contains "map2") {
//Append your output string
}
}
context.write(key, output)

Upvotes: 1

Related Questions