Reputation: 55
I have this DataFrame in Pandas:
Age InterventionType PrimaryPurpose
0 [Adult, Senior] Device Treatment
1 [Child, Adult, Senior] Drug Basic Science
2 [Adult, Senior] Drug Treatment
3 [Adult, Senior] NaN NaN
4 [Adult, Senior] Other NaN
And want to to pivot on InterventionType, such that I get:
Drug Adult 2
Senior 2
Child 1
Device Adult 1
Senior 1
Other Adult 1
Senior 1
How do I accomplish this? Also, is it non-standard for my DataFrame to have lists? If so, what is a good practice to "de-list" the lists?
Upvotes: 1
Views: 1617
Reputation: 848
You can use explode()
followed by groupby()
import numpy as np
import pandas as pd
Age = [["Adult", "Senior"],
["Child", "Adult", "Senior"],
["Adult", "Senior"],
["Adult", "Senior"],
["Adult", "Senior"]]
InterventionType = ["Device", "Drug", "Drug", np.NaN, "Other"]
PrimaryPurpose = ["Treatment", "Basic Science", "Treatment", np.NaN, np.NaN]
df = pd.DataFrame({"Age": Age,
"InterventionType": InterventionType,
"PrimaryPurpose": PrimaryPurpose})
df1 = df.explode("Age")
df1.groupby(["InterventionType", "Age"])[["InterventionType"]].count()\
.rename(columns = {"InterventionType": "Count"})
Out[29]:
Count
InterventionType Age
Device Adult 1
Senior 1
Drug Adult 2
Child 1
Senior 2
Other Adult 1
Senior 1
Upvotes: 1
Reputation: 41
Assuming that your DataFrame is called df
, you can do the following,
pd.DataFrame(
df["Age"].values.tolist(),
index=df["InterventionType"]
).stack().reset_index().drop("level_1", axis=1).groupby(["InterventionType", 0]).size()
Will return the following,
InterventionType 0
Device Adult 1
Senior 1
Drug Adult 2
Child 1
Senior 2
Other Adult 1
Senior 1
dtype: int64
This is a Pandas Series with a MultiIndex. The index names from the code above are "InterventionType" and 0, however these can be easily changed.
Upvotes: 0