RAKESH REDDY
RAKESH REDDY

Reputation: 33

Multiple ORDER by on Desc in pig

I would like to get the latest date for the cid, also latest amount on the same date. For latest date I implemented as below

    A = LOAD '$input' AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
    B = FOREACH (GROUP A BY (cid,tid)) {
    sort = ORDER A BY date DESC;
    latest = LIMIT sort 1;
    GENERATE FLATTEN(newest);`enter code here`
   };'

But I want the latest amount, for that I have multiple records on the same date, so tried to get the amount by ordering on time like below.

    AMT = FOREACH (GROUP B BY (cid,tid)){
    sort1 = ORDER B BY time DESC;
    lastamt = LIMIT sort1 1;
    GENERATE FLATTEN(lastamt.amt);
  };

I/p :

 9822736906^A2015-08-02^A146.08^A^A21:57:05.000000
 9822736906^A2015-08-02^A250.12^A58926968^A22:45:30.000000
 9822736906^A2015-08-02^A132.1^A00000000^A22:55:29.000000
 9822736906^A2015-08-02^A60.97^A00000000^A23:02:48.000000
 9826964132^A2015-08-05^A98.2^A^A23:05:46.000000
 9822736906^A2015-08-05^A85.71^A4F7581^A23:12:22.000000
 9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000

O/p should be

9822736906^A2015-08-05^A655.73^A00000000^A23:17:24.000000 
9826964132^A2015-08-05^A98.2^A^A23:05:46.000000

9822736906^A2015-08-02^A60.97^A00000000^A23:02:48.000000

Upvotes: 3

Views: 10030

Answers (1)

Murali Rao
Murali Rao

Reputation: 2287

If the objective is to select latest record for a cid then the below snippet will work.

Order by date and time in desc order in the same ORDER BY operator.

Input :

9822736906  2015-08-02  146.08      21:57:05.000000
9822736906  2015-08-02  250.12  58926968    22:45:30.000000
9822736906  2015-08-02  132.1   00000000    22:55:29.000000
9822736906  2015-08-02  60.97   00000000    23:02:48.000000
9826964132  2015-08-05  98.2        23:05:46.000000
9822736906  2015-08-05  85.71   4F7581  23:12:22.000000
9822736906  2015-08-05  655.73  00000000    23:17:24.000000

Pig script :

A = LOAD 'a.csv' USING PigStorage('\t') AS (cid:chararray, date:chararray, amt:chararray,tid:chararray, time:chararray);
B = GROUP A BY cid;
C = FOREACH B {
    sort = ORDER A BY date DESC, time DESC;
    latest = LIMIT sort 1;
    GENERATE FLATTEN(latest);
   };

Output : DUMP C :

(9822736906,2015-08-05,655.73,00000000,23:17:24.000000)
(9826964132,2015-08-05,98.2,,23:05:46.000000) 

Upvotes: 4

Related Questions