Homan Mohammadi
Homan Mohammadi

Reputation: 55

Pandas: Pivoting on column of lists?

I have this DataFrame in Pandas:

                      Age InterventionType PrimaryPurpose
0         [Adult, Senior]           Device    Treatment
1  [Child, Adult, Senior]             Drug    Basic Science
2         [Adult, Senior]             Drug    Treatment
3         [Adult, Senior]              NaN            NaN
4         [Adult, Senior]            Other            NaN

And want to to pivot on InterventionType, such that I get:

Drug     Adult     2
         Senior    2
         Child     1
Device   Adult     1
         Senior    1
Other    Adult     1
         Senior    1

How do I accomplish this? Also, is it non-standard for my DataFrame to have lists? If so, what is a good practice to "de-list" the lists?

Upvotes: 1

Views: 1617

Answers (2)

hnagaty
hnagaty

Reputation: 848

You can use explode() followed by groupby()

import numpy as np
import pandas as pd

Age = [["Adult", "Senior"],
        ["Child", "Adult", "Senior"],
        ["Adult", "Senior"],
        ["Adult", "Senior"],
        ["Adult", "Senior"]]
InterventionType = ["Device", "Drug", "Drug", np.NaN, "Other"]

PrimaryPurpose = ["Treatment", "Basic Science", "Treatment", np.NaN, np.NaN]

df = pd.DataFrame({"Age": Age,
                   "InterventionType": InterventionType,
                   "PrimaryPurpose": PrimaryPurpose})

df1 = df.explode("Age")


df1.groupby(["InterventionType", "Age"])[["InterventionType"]].count()\
    .rename(columns = {"InterventionType": "Count"})
Out[29]: 
                         Count
InterventionType Age          
Device           Adult       1
                 Senior      1
Drug             Adult       2
                 Child       1
                 Senior      2
Other            Adult       1
                 Senior      1

Upvotes: 1

Harry O'Reilly
Harry O'Reilly

Reputation: 41

Assuming that your DataFrame is called df, you can do the following,

pd.DataFrame(
    df["Age"].values.tolist(),
    index=df["InterventionType"]
).stack().reset_index().drop("level_1", axis=1).groupby(["InterventionType", 0]).size()

Will return the following,

InterventionType  0
Device            Adult     1
                  Senior    1
Drug              Adult     2
                  Child     1
                  Senior    2
Other             Adult     1
                  Senior    1
dtype: int64

This is a Pandas Series with a MultiIndex. The index names from the code above are "InterventionType" and 0, however these can be easily changed.

Upvotes: 0

Related Questions