Python Pandas Group by Count Pivot of multiple colunms

Question

Given the Header as TimStamp, Gender, and Race, I would like to have the count of Gender and Count of Race Group by the Time Stamp. Given Gender = M/F Race = A/B/C/D/E .

Race = E may not exist in the dataSet, but anticipate data with Race E. Hence a Place Holder is Important with value Zero.

Example of Data and output shows below.

Time_stamp 12:30, 12:30, 12:30, 12:30, 12:31, 12:31, 12:32,

Gender = M, F, F, F, M, F, M

Race = A, A, B, B, C, A, D

ivallesp · Accepted Answer

You will need to do two pivot tables for that:

For achieving a table with all the specified categories in Raceyou need to convert Race to categorical variable:

df["Race"] = pd.Categorical(df.Race, categories=["A", "B", "C", "D", "E"])

For gender:

 df_g =  df.groupby(["Time_stamp", "Gender"], observed=False).count().fillna(0).unstack()

And for Race:

df_r = (df.groupby(["Time_stamp", "Race"], observed=False)
        .count().fillna(0).reset_index()
        .astype({"Race": str}).pivot_table(index="Time_stamp", columns="Race"))

And then you can join them:

df_report = df_r.join(df_g)  
df_report.columns = df_report.columns.droplevel()

Python Pandas Group by Count Pivot of multiple colunms

Answers (1)

Related Questions