jcoke
jcoke

Reputation: 1891

Dynamically explode list columns in Pandas

I have a list of dictionaries which has been converted into a dataframe using json_normalize.

Dataset:

x = [{ 
    "_id" : 71, 
    "Ids" : [
        "10", 
        "59"
    ], 
    "roles" : [
        "janitor", "mechanic", "technician"
    ]
}]

Dataframe:

   _id       Ids                            roles
    71  [10, 59]  [janitor, mechanic, technician]

What I am trying to do is find a way to dynamically explode all the list columns/keys (Ids and roles) without explicitly typing the column names. Is this possible?

Desired Output:

   _id       Ids         roles
    71        10       janitor
    71        10      mechanic
    71        10    technician
    71        59       janitor
    71        59      mechanic
    71        59    technician

Any assistance would be appreciated.

Upvotes: 0

Views: 220

Answers (3)

Giovanni Frison
Giovanni Frison

Reputation: 688

I would do it like this:

from itertools import product

list_ = [df.iloc[0,i] if type(df.iloc[0,i]) == list else [df.iloc[0,i]] for i in range(df.shape[1])]
prod = list(product(*list_))
df = pd.DataFrame(prod, columns=df.columns)

Upvotes: 0

Lukas Schmid
Lukas Schmid

Reputation: 1960

A naive solution iterating through all entries checking for lists.

Works for multiple lines of entries and nested lists.

    while True:
        newdf = pd.DataFrame(columns=df.columns)
        for row in df.values:
            for index, value in enumerate(row):
                if isinstance(value, list):
                    for listentry in value:
                        newdf.loc[len(newdf)] = [*row[:index], listentry, *row[index+1:]] 
                    break
            else:
                newdf.loc[len(newdf)] = row
        df = newdf.copy(deep=True)
        if not newdf.applymap(lambda value:isinstance(value, list)).values.any():
            break

Upvotes: 1

Epsi95
Epsi95

Reputation: 9047

I am not sure about the efficiency of this, but it just iterating over the dataframe column names and check whether it is list or not. If it is list just explode.

df_final = df.copy()

for c in df.columns:
    if(isinstance(df[c][0],list)):
        df_final = df_final.explode(c)
    _id Ids roles
0   71  10  janitor
0   71  10  mechanic
0   71  10  technician
0   71  59  janitor
0   71  59  mechanic
0   71  59  technician

Upvotes: 2

Related Questions