Reputation: 5272
The pandas explode method creates new row for each value found in the inner list of a given column ; this is so a row-wise explode.
Is there an easy column-wise explode already implemented in pandas, ie something to transform df
into the second dataframe ?
MWE:
>>> s = pd.DataFrame([[1, 2], [3, 4]]).agg(list, axis=1)
>>> df = pd.DataFrame({"a": ["a", "b"], "s": s})
>>> df
Out:
a s
0 a [1, 2]
1 b [3, 4]
>>> pd.DataFrame(s.tolist()).assign(a=["a", "b"]).reindex(["a", 0, 1], axis=1)
Out[121]:
a 0 1
0 a 1 2
1 b 3 4
Upvotes: 0
Views: 3390
Reputation: 1711
I tried to benchmark above answers approach and found that the below approach is almost 5x faster with the following setup
import pandas as pd
from typing import List
n_rows=100000
list_dim = 384
a: List[str] = ["a"]*n_rows
b: List[List[float]] = [[0.1]*list_dim]*n_rows
df = pd.DataFrame({"a": a, "b": b})
def expand_list_col(df: pd.DataFrame, column_name: str) -> pd.DataFrame:
new_col_names: List[str] = [
f"{column_name}_{i}" for i in range(len(df[column_name].iloc[0]))
]
df_col_expanded = pd.DataFrame(
df[column_name].to_list(), columns=new_col_names
)
df.drop(columns=[column_name], inplace=True)
df.reset_index(drop=True, inplace=True)
df_col_expanded.reset_index(drop=True, inplace=True)
df = pd.concat([df, df_col_expanded], axis=1)
return df
Upvotes: 2
Reputation: 18406
You can use apply
to convert those values to Pandas Series
, which will ultimately transform the dataframe in the required format:
>>> df.apply(pd.Series)
Out[28]:
0 1
0 1 2
1 3 4
As a side note, your df
becomes a Pandas series after using agg
For the updated data, you can concat above result to the existing data frame
>>> pd.concat([df, df['s'].apply(pd.Series)], axis=1)
Out[48]:
a s 0 1
0 a [1, 2] 1 2
1 b [3, 4] 3 4
Upvotes: 3