Reputation: 121
The relation Joined_Usage is obtained by full outer join on multiple relations and the keys obtained from those relations are in $0,$6,$12,$18,$24 and $30 of Joined_Usage. The keys for each record if not null are equal.
I wish to derive a new Relation Usage from Joined Usage such that $0 of Usage consists of the key for the corresponding record in Joined_Usage (i.e. is not null).
I am using the following code:
Usage = foreach Joined_Usage generate ($0 is not null ? $0 :
($6 is not null ? $6 :
($12 is not null ? $12 :
($18 is not null ? $18 :
$24 is not null ? $24 : $30)
)
)
)
);
However when I am finding the count of no. of records in Usage as:
b = group Usage all; c = foreach b generate COUNT(Usage);
the count is showing to be much more Joined_Usage which implies duplication of records...I do not understand how this is happening...please help!
Upvotes: 0
Views: 814
Reputation: 1177
You should do some data profiling - did you try to fetch COUNT_STAR and COUNT on DISTINCT values? Are you sure that you're not joining relations that have duplicate keys in them? You can test that by grouping on the join keys before doing the actual join (and use MAX to get the fields that aren't the keys).
Upvotes: 0