Efrain Valles
Efrain Valles

Reputation: 11

PANDAS: a way to combine rows that are grouped by a field

I have a DataFrame that looks like:

test1 = pd.DataFrame( {
    "ROUTE" : ["MIA-ORD", "MIA-AUA", "ORD-MIA", "MIA-HOU", "MIA-JFK", "JFK-MIA", "JFK-YYZ"],
    "TICKET" : ["123", "345", "123", "678", "456", "345", "456"],
    "COUPON" : [1,4,2,1,1,3,2],
    "PAX" : ["Jessica", "Alex", "Jessica", "Jamanica", "Ernest","Alex", "Ernest"],
    "PAID": [100.00,200.00,100.00,100.00,200.00,200.00,200.00]})

this gives me

     ROUTE TICKET  COUPON       PAX   PAID
0  MIA-ORD    123       1   Jessica  100.0
1  MIA-AUA    345       4      Alex  200.0
2  ORD-MIA    123       2   Jessica  100.0
3  MIA-HOU    678       1  Jamanica  100.0
4  MIA-JFK    456       1    Ernest  200.0
5  JFK-MIA    345       3      Alex  200.0
6  JFK-YYZ    456       2    Ernest  200.0

what I am trying to do is to combine the Route and coupon data to be

     ROUTE          TICKET     COUPON       PAX     PAID
0  MIA-ORD-ORD-MIA    123       1-2       Jessica   100.0
1  JFK-MIA-MIA-AUA    345       3-4         Alex    200.0
2  MIA-HOU            678       1         Jamanica  100.0
3  MIA-JFK-JFK-YYZ    456       1-2        Ernest   200.0

So Far I have been able to group by ticket since its the obivous common identifier and sorted Coupons since the order of the flights for 'ALEX' are inverted.

rs1 = test1.groupby(['TICKET']).apply(pd.DataFrame.sort_values,'COUPON')

This results

            ROUTE TICKET  COUPON       PAX   PAID
TICKET                                           
123    0  MIA-ORD    123       1   Jessica  100.0
       2  ORD-MIA    123       2   Jessica  100.0
345    5  JFK-MIA    345       3      Alex  200.0
       1  MIA-AUA    345       4      Alex  200.0
456    4  MIA-JFK    456       1    Ernest  200.0
       6  JFK-YYZ    456       2    Ernest  200.0
678    3  MIA-HOU    678       1  Jamanica  100.0

but from here i cannot merge the ROUTE and COUPON.

I have tried:

st1=test1.groupby('TICKET').apply(lambda group: ','.join(group['ROUTE']))

But that only brings about the merged colmuns sorted alone. not the rest of the data.

TICKET
123    MIA-ORD,ORD-MIA
345    MIA-AUA,JFK-MIA
456    MIA-JFK,JFK-YYZ
678            MIA-HOU
dtype: object

Any Ideas?

Upvotes: 1

Views: 45

Answers (1)

Erfan
Erfan

Reputation: 42916

We can use groupby in combination with agg and then apply '-'.join():

test1['COUPON']=test1['COUPON'].astype(str)

final = test1.groupby(['TICKET', 'PAX', 'PAID']).agg({'ROUTE':'-'.join,
                                                      'COUPON':'-'.join}).reset_index()
print(final)
  TICKET       PAX   PAID            ROUTE COUPON
0    123   Jessica  100.0  MIA-ORD-ORD-MIA    1-2
1    345      Alex  200.0  MIA-AUA-JFK-MIA    4-3
2    456    Ernest  200.0  MIA-JFK-JFK-YYZ    1-2
3    678  Jamanica  100.0          MIA-HOU      1

Upvotes: 3

Related Questions