For the below DataFrame: ID Reg Departure Arrival Date Time 1 ABC DUB LCY 22/0/22 15:23 2 ABC DUB LCY 22/0/22 15:27 3 CBA CPH HEL 22/0/22 12:21 4 CBA CPH HE 22/0/22 12:19 I would like to pivot so that it so that the resulting DataFrame is: ID Reg Departure Arrival Date Time ID2 Time2 1 ABC DUB LCY 22/0/22 15:23 2 15:27 3 CBA CPH HEL 22/0/22 12:21 4 12:19 I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!

Reputation: 35

Python: Pandas Pivot DataFrame

For the below DataFrame:

ID	Reg	Departure	Arrival	Date	Time
1	ABC	DUB	LCY	22/0/22	15:23
2	ABC	DUB	LCY	22/0/22	15:27
3	CBA	CPH	HEL	22/0/22	12:21
4	CBA	CPH	HE	22/0/22	12:19

I would like to pivot so that it so that the resulting DataFrame is:

ID	Reg	Departure	Arrival	Date	Time	ID2	Time2
1	ABC	DUB	LCY	22/0/22	15:23	2	15:27
3	CBA	CPH	HEL	22/0/22	12:21	4	12:19

I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!

Upvotes: 1

Answers (2)

Denny Chen

Reputation: 499

Initialize the dataframe

import pandas as pd
import io
str = '''ID Reg Departure   Arrival Date    Time
1   ABC DUB LCY 22/0/22 15:23
2   ABC DUB LCY 22/0/22 15:27
3   CBA CPH HEL 22/0/22 12:21
4   CBA CPH HEL 22/0/22 12:19'''

df = pd.read_csv(io.StringIO(str), sep = '\s+')

We first create a sequence number by group for unstack() later .

df["grpseq"] = df.groupby(["Reg", "Departure", "Arrival", "Date"]).cumcount()

df_new = df.set_index(["Reg", "Departure", "Arrival", "Date", "grpseq"]).unstack(level = -1).reset_index()
df_new.columns = ["Reg", "Departure", "Arrival", "Date", "ID1", "ID2", "Time1", "Time2"]

df_new
    Reg  Departure  Arrival    Date    ID1  ID2 Time1   Time2
0   ABC      DUB       LCY     22/0/22  1   2   15:23   15:27
1   CBA      CPH       HEL     22/0/22  3   4   12:21   12:19

Where ID1 represent the ID column in your desire table. And a simple reorder of the columns will be done.

Upvotes: 1

Cam

Reputation: 1731

So this will work if you have just two of each Reg row. But if you have more duplicate rows you will need something more robust.

import data and pandas

import pandas as pd

output

    ID  Reg Departure   Arrival Date    Time
0   1   ABC DUB         LCY     22/0/22 15:23
1   2   ABC DUB         LCY     22/0/22 15:27
2   3   CBA CPH         HEL     22/0/22 12:21
3   4   CBA CPH         HE      22/0/22 12:19

make two frames with with 'firsts' and one with 'lasts'

final_df = df.drop_duplicates(keep='first', subset='Departure')
id_time_df = df.drop_duplicates(keep='last', subset='Departure')

Then merge the two frames

pd.merge(final_df, id_time_df[['ID', 'Reg', 'Time']], on='Reg')

output

    ID_x    Reg Departure   Arrival Date    Time_x  ID_y    Time_y
0   1       ABC DUB         LCY     22/0/22 15:23   2       15:27
1   3       CBA CPH         HEL     22/0/22 12:21   4       12:19

Upvotes: 2

Python: Pandas Pivot DataFrame

Answers (2)

Related Questions