Reputation: 2532
I want to unpack multiple columns of lists into many more columns. Basically this but for multiple columns of lists rather than just one, and avoiding for loops.
As an example I have a pandas.DataFrame
,
import pandas as pd
tst = pd.DataFrame({'A': [[1, 2]]* 5, 'B': [[3, 4]]* 5, 'C': [[5, 6]] * 5})
I can easily unpack one of the columns e.g. A
into multiple columns,
pd.DataFrame(tst['A'].to_list(),
columns=['1' + tst['A'].name, '2' + tst['A'].name],
index=list(range(tst['A'].shape[0]))
)
However when I tried expanding this to multiple columns using .apply
to avoid a for loop,
tst.apply(
lambda x: pd.DataFrame(x.to_list(),
columns=['1' + x.name, '2' + x.name],
index=list(range(x.shape[0]))
)
)
I get the below error, however I am supplying an index
...
ValueError: If using all scalar values, you must pass an index
Is there a way to fix this so that I get an output as per below? (column order doesn't matter)
1C 2C 1B 2B 1A 2A
0 5 6 3 4 1 2
1 5 6 3 4 1 2
2 5 6 3 4 1 2
3 5 6 3 4 1 2
4 5 6 3 4 1 2
pd.__version__ == '1.0.5'
Upvotes: 1
Views: 225
Reputation: 71687
You can horizontally stack
the columns, then create a new dataframe and rename the columns:
df = pd.DataFrame(np.hstack(tst.values.T.tolist()))
df.columns = [f'{i}{c}' for c in tst for i in range(1,3)]
Alternatively you can concat
along axis=1
:
df = pd.concat([pd.DataFrame(tst[c].tolist()) for c in tst], axis=1)
df.columns = [f'{i}{c}' for c in tst for i in range(1,3)]
print(df)
1A 2A 1B 2B 1C 2C
0 1 2 3 4 5 6
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
Upvotes: 1
Reputation: 672
If you don't mind to change apply
to explode
then this is one line solution. Kr.
res=pd.concat([pd.DataFrame(tst[[x]].explode(x).values.reshape(-1,2), columns=['1' + x, '2' + x]) for x in tst.columns], 1)
print(res)
Which returns:
1A 2A 1B 2B 1C 2C
0 1 2 3 4 5 6
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
Upvotes: 1