ClementWalter
ClementWalter

Reputation: 5272

Pandas explode to create new columns

The pandas explode method creates new row for each value found in the inner list of a given column ; this is so a row-wise explode.

Is there an easy column-wise explode already implemented in pandas, ie something to transform df into the second dataframe ?

MWE:

>>> s = pd.DataFrame([[1, 2], [3, 4]]).agg(list, axis=1)
>>> df = pd.DataFrame({"a": ["a", "b"], "s": s})
>>> df
Out: 
   a       s
0  a  [1, 2]
1  b  [3, 4]

>>> pd.DataFrame(s.tolist()).assign(a=["a", "b"]).reindex(["a", 0, 1], axis=1)
Out[121]: 
   a  0  1
0  a  1  2
1  b  3  4

Upvotes: 0

Views: 3390

Answers (2)

S.MC.
S.MC.

Reputation: 1711

I tried to benchmark above answers approach and found that the below approach is almost 5x faster with the following setup

import pandas as pd
from typing import List
n_rows=100000
list_dim = 384
a: List[str] = ["a"]*n_rows
b: List[List[float]] = [[0.1]*list_dim]*n_rows
df = pd.DataFrame({"a": a, "b": b})

def expand_list_col(df: pd.DataFrame, column_name: str) -> pd.DataFrame: 
    new_col_names: List[str] = [
        f"{column_name}_{i}" for i in range(len(df[column_name].iloc[0]))
    ]
    df_col_expanded = pd.DataFrame(
        df[column_name].to_list(), columns=new_col_names
    )
    df.drop(columns=[column_name], inplace=True)
    df.reset_index(drop=True, inplace=True)
    df_col_expanded.reset_index(drop=True, inplace=True)
    df = pd.concat([df, df_col_expanded], axis=1)
    return df

Upvotes: 2

ThePyGuy
ThePyGuy

Reputation: 18406

You can use apply to convert those values to Pandas Series, which will ultimately transform the dataframe in the required format:

>>> df.apply(pd.Series)
Out[28]: 
   0  1
0  1  2
1  3  4

As a side note, your df becomes a Pandas series after using agg

For the updated data, you can concat above result to the existing data frame

>>> pd.concat([df, df['s'].apply(pd.Series)], axis=1)
Out[48]: 
   a       s  0  1
0  a  [1, 2]  1  2
1  b  [3, 4]  3  4

Upvotes: 3

Related Questions