Reputation: 1721
Given the Header as TimStamp, Gender, and Race, I would like to have the count of Gender and Count of Race Group by the Time Stamp. Given Gender = M/F Race = A/B/C/D/E .
Race = E may not exist in the dataSet, but anticipate data with Race E. Hence a Place Holder is Important with value Zero.
Example of Data and output shows below.
Time_stamp 12:30, 12:30, 12:30, 12:30, 12:31, 12:31, 12:32,
Gender = M, F, F, F, M, F, M
Race = A, A, B, B, C, A, D
Upvotes: 1
Views: 420
Reputation: 2212
You will need to do two pivot tables for that:
For achieving a table with all the specified categories in Race
you need to convert Race
to categorical variable:
df["Race"] = pd.Categorical(df.Race, categories=["A", "B", "C", "D", "E"])
For gender:
df_g = df.groupby(["Time_stamp", "Gender"], observed=False).count().fillna(0).unstack()
And for Race:
df_r = (df.groupby(["Time_stamp", "Race"], observed=False)
.count().fillna(0).reset_index()
.astype({"Race": str}).pivot_table(index="Time_stamp", columns="Race"))
And then you can join them:
df_report = df_r.join(df_g)
df_report.columns = df_report.columns.droplevel()
Upvotes: 2