manojpt
manojpt

Reputation: 1269

Apache pig query to join two schemas

For example,

  relation1: {a: chararray, b: chararray}
  (1, abc)
  (2, asd)
  relation2: {a: chararray, c: chararray}
  (1, 2.5)
  (2, 4.0)

The question is: Is it possible to get a resultant relation with schema as below: For example: number of tuples in realtion1 is 2 and number of tuples in relation2 is 2. The number of tuples in resultant relation should also be 2 only.

  relation3: {a: chararray, b: chararray, c: chararray}
  (1, abc, 2.5)
  (2, asd, 4.0)

Could any one help to solve this.

Upvotes: 0

Views: 224

Answers (1)

TheCowGoesMoo
TheCowGoesMoo

Reputation: 176

joined = JOIN relation1 BY a, relation2 BY a;
describe joined;
-- joined: {relation1::a: chararray,relation1::b: chararray,relation2::a: chararray,relation2::c: chararray}
relation3 = FOREACH joined GENERATE relation1::a AS a, b, c;
describe relation3;
-- relation3: {a: chararray, b: chararray, c: chararray}

The '::' is called the disambugate operator and used to deconflict schemas after certain operations (such as a join). After a join all of the values from each alias are still present and you can pull them out using a FOREACH.

Upvotes: 2

Related Questions