Ronnie_drew
Ronnie_drew

Reputation: 35

Python: Pandas Pivot DataFrame

For the below DataFrame:

ID Reg Departure Arrival Date Time
1 ABC DUB LCY 22/0/22 15:23
2 ABC DUB LCY 22/0/22 15:27
3 CBA CPH HEL 22/0/22 12:21
4 CBA CPH HE 22/0/22 12:19

I would like to pivot so that it so that the resulting DataFrame is:

ID Reg Departure Arrival Date Time ID2 Time2
1 ABC DUB LCY 22/0/22 15:23 2 15:27
3 CBA CPH HEL 22/0/22 12:21 4 12:19

I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!

Upvotes: 1

Views: 127

Answers (2)

Denny Chen
Denny Chen

Reputation: 499

Initialize the dataframe

import pandas as pd
import io
str = '''ID Reg Departure   Arrival Date    Time
1   ABC DUB LCY 22/0/22 15:23
2   ABC DUB LCY 22/0/22 15:27
3   CBA CPH HEL 22/0/22 12:21
4   CBA CPH HEL 22/0/22 12:19'''

df = pd.read_csv(io.StringIO(str), sep = '\s+')

We first create a sequence number by group for unstack() later .

df["grpseq"] = df.groupby(["Reg", "Departure", "Arrival", "Date"]).cumcount()

df_new = df.set_index(["Reg", "Departure", "Arrival", "Date", "grpseq"]).unstack(level = -1).reset_index()
df_new.columns = ["Reg", "Departure", "Arrival", "Date", "ID1", "ID2", "Time1", "Time2"]

df_new
    Reg  Departure  Arrival    Date    ID1  ID2 Time1   Time2
0   ABC      DUB       LCY     22/0/22  1   2   15:23   15:27
1   CBA      CPH       HEL     22/0/22  3   4   12:21   12:19

Where ID1 represent the ID column in your desire table. And a simple reorder of the columns will be done.

Upvotes: 1

Cam
Cam

Reputation: 1731

So this will work if you have just two of each Reg row. But if you have more duplicate rows you will need something more robust.

import data and pandas

import pandas as pd

output

    ID  Reg Departure   Arrival Date    Time
0   1   ABC DUB         LCY     22/0/22 15:23
1   2   ABC DUB         LCY     22/0/22 15:27
2   3   CBA CPH         HEL     22/0/22 12:21
3   4   CBA CPH         HE      22/0/22 12:19

make two frames with with 'firsts' and one with 'lasts'

final_df = df.drop_duplicates(keep='first', subset='Departure')
id_time_df = df.drop_duplicates(keep='last', subset='Departure')

Then merge the two frames

pd.merge(final_df, id_time_df[['ID', 'Reg', 'Time']], on='Reg')

output

    ID_x    Reg Departure   Arrival Date    Time_x  ID_y    Time_y
0   1       ABC DUB         LCY     22/0/22 15:23   2       15:27
1   3       CBA CPH         HEL     22/0/22 12:21   4       12:19

Upvotes: 2

Related Questions