user3216394
user3216394

Reputation: 43

Creating sequence number in hive or Pig

I'm facing a data transformation issue : I have the table here under with 3 columns : client, event, timestamp. And I basically want to assign a sequence number to all events for a given client based on timestamp, which is basically the "Sequence" columns I added hereunder.

Client  Event   TimeStamp                Sequence
C1      Ph      2014-01-30 12:15:23      1
C1      Me      2014-01-31 15:11:34      2
C1      Me      2014-01-31 17:16:05      3

C2      Me      2014-02-01 09:22:52      1
C2      Ph      2014-02-01 17:22:52      2

I can't figure out how to create this sequence number in hive or Pig. Would you have any clue ?

Thanks in advance !

Guillaume

Upvotes: 2

Views: 2029

Answers (2)

Gaurav Phapale
Gaurav Phapale

Reputation: 979

Put all the records in a bag (by say grouping all), sort the tuples inside bag by TimeStamp field and then use Enumerate function.

Something like below (I did not execute the code, so you might need to clean it up a bit):

// assuming input contains 3 columns - client, event, timestamp
input2 = GROUP input all;
input3 = FOREACH input2
         {
             sorted = ORDER input BY timestamp;
             sorted2 = Enumerate(sorted);
             GENERATE FLATTEN(sorted2);
         }

Upvotes: 4

eleven
eleven

Reputation: 1

We eventually modified enumerate source the following way and it works great :

public void accumulate(Tuple arg0) throws IOException {
nevents=13;
i=nevents+1;
DataBag inputBag = (DataBag)arg0.get(0);
  Tuple t2 = TupleFactory.getInstance().newTuple();
for (Tuple t : inputBag) {
  Tuple t1 = TupleFactory.getInstance().newTuple(t.getAll());
  tampon=t1.get(2).toString();
  if (tampon.equals("NA souscription Credit Conso")) {
    if (i <= nevents) {
      outputBag.add(t2);
      t2=TupleFactory.getInstance().newTuple();
     }
    i=0;
    t2.append(t1.get(0).toString());
    t2.append(t1.get(1).toString());
    t2.append(t1.get(2).toString());
    i++;
   }
   else if (i < nevents) {
    t2.append(tampon);
    i++;
   }
   else if (i == nevents) {
    t2.append(tampon);
    outputBag.add(t2);
    i++;
    t2=TupleFactory.getInstance().newTuple();
   }
  if (count % 1000000 == 0) {
    outputBag.spill();
    count = 0;
  }
  ;
  count++;
}
if (t2.size()!=0) {
  outputBag.add(t2);
}
}

Upvotes: 0

Related Questions