Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Question

I have the following dataframe:

     Name   rollNumber   external_roll_number    testDate      marks 

0    John      34             234               2021-04-28      15 

1    John      34             234               2021-03-28      25

I would like to convert it like this:

     Name   rollNumber   external_roll_number    testMonth      marks    testMonth      marks

0    John      34             234                  April          15       March         25

If the above is not possible then I would atleast want it to be like this:

     Name   rollNumber   external_roll_number    testDate      marks    testDate      marks

0    John      34             234                2021-04-28      15     2021-03-28       25

How can I convert my dataframe to the desired output? This change will be based on the Name column of the rows.

EDIT 1

I tried using pivot_table like this but I did not get the desired result.

merged_df_pivot = pd.pivot_table(merged_df, index=["name", "testDate"], aggfunc="first", dropna=False).fillna("")

When I try to iterate through the merged_df_pivot like this:

for index, details in merged_df_pivot.iterrows():

I am again getting two rows and also I was not able to add the new testMonth column by the above method.

Rob Raymond · Accepted Answer

core is unstack() month to be columns
detail then to re-structure month-by month marks columns to required structure
generally consider bad practice to have duplicate column names, hence have suffixed them

df = pd.read_csv(io.StringIO("""     Name   rollNumber   external_roll_number    testDate      marks 
0    John      34             234               2021-04-28      15 
1    John      34             234               2021-03-28      25
"""), sep="\s+")

df["testDate"] =pd.to_datetime(df["testDate"])
df = df.assign(testMonth = df["testDate"].dt.strftime("%B")).drop(columns="testDate")


dft = (df.set_index([c for c in df.columns if c!="marks"])
 .unstack("testMonth") # make month a column
 .droplevel(0, axis=1) # remove unneeded level in columns
 # create columns for months from column names and rename marks columns
 .pipe(lambda d: d.assign(**{f"testMonth_{i+1}":c 
                             for i,c in enumerate(d.columns)}).rename(columns={c:f"marks_{i+1}" 
                                                                               for i,c in enumerate(d.columns)}))
 .reset_index()
)

output

	Name	rollNumber	external_roll_number	marks_1	marks_2	testMonth_1	testMonth_2
0	John	34	234	15	25	April	March

Combine multiple rows based on one colum value and add extra columns based on other column value in Pandas

Answers (1)

output

Related Questions