Reputation: 124
I have the below sample data that i'm using to learn hadoop mapreduce. for example it is the data of follower and followee.
Follower,followee
a,b
a,c
a,d
c,b
b,d
d,a
b,c
b,e
e,f
like a is following b, a is following c and so on....
i'm trying to manipulate the data and get the result such that if a is following b and b is also following a then a,b should be the result in the output txt file. i'm new to map reduce and trying to find a way so that i can get the below result.
a,d
c,b
Upvotes: 2
Views: 101
Reputation: 6343
You can achieve this using a trick.
The trick is to pass the keys to the reducer in such a way that, both (a,d) and (d,a) have the same key and end up in the same reducer:
When (a,d) comes:
'a' < 'd', hence emit:
key => a,d
value => a,d
When (d,a) comes:
'd' > 'a', hence emit:
key => a,d
value => d,a
Key is always formed in such a way that lower alphabet comes before the higher alphabet. So for both the records, the key is "a,d"
So output of mapper will be:
Record: a,b
Key = a,b Value = a,b
Record: a,c
Key = a,c Value = a,c
Record: a,d
Key = a,d Value = a,d
Record: c,b
Key = b,c Value = c,b
Record: b,d
Key = b,d Value = b,d
Record: d,a
Key = a,d Value = d,a
Record: b,c
Key = b,c Value = b,c
Record: b,e
Key = b,e Value = b,e
Record: e,f
Key = e,f Value = e,f
Now, in the Reducers the records will arrive in the following order:
Record 1:
Key = a,b Value = a,b
Record 2:
Key = a,c Value = a,c
Record 3:
Key = a,d Value = a,d
Key = a,d Value = d,a
Record 4:
Key = b,c Value = c,b
Key = b,c Value = b,c
Record 5:
Key = b,d Value = b,d
Record 6:
Key = b,e Value = b,e
Record 7:
Key = e,f Value = e,f
So, in the reducer, you can just parse Records 3 and 4:
Record 3:
Key = a,d Value = a,d
Key = a,d Value = d,a
Record 4:
Key = b,c Value = c,b
Key = b,c Value = b,c
So, the output will be:
a,d
c,b
This logic will work, even if you have Names instead of Alphabets. For e.g. you need to use following logic in the mapper side (where s1 is first string and s2 is second string):
String key = "";
int compare = s1.compareToIgnoreCase(s2);
if(compare >= 0)
key = s1 + "," + s2;
else
key = s2 + "," + s1;
So, if you have:
String s1 = "Stack";
String s2 = "Overflow";
the key will be:
Stack,Overflow
Similarly, if you have:
s1 = "Overflow";
s2 = "Stack";
still, the key will be:
Stack,Overflow
Upvotes: 3