RK1
RK1

Reputation: 2532

Unpacking many columns of lists using apply get ValueError: If using all scalar values, you must pass an index

I want to unpack multiple columns of lists into many more columns. Basically this but for multiple columns of lists rather than just one, and avoiding for loops.

As an example I have a pandas.DataFrame,

import pandas as pd

tst = pd.DataFrame({'A': [[1, 2]]* 5, 'B': [[3, 4]]* 5, 'C': [[5, 6]] * 5})

I can easily unpack one of the columns e.g. A into multiple columns,

pd.DataFrame(tst['A'].to_list(), 
             columns=['1' + tst['A'].name, '2' + tst['A'].name],
             index=list(range(tst['A'].shape[0]))
            )

However when I tried expanding this to multiple columns using .apply to avoid a for loop,

tst.apply(
    lambda x: pd.DataFrame(x.to_list(), 
                           columns=['1' + x.name, '2' + x.name], 
                           index=list(range(x.shape[0]))
                          )
)

I get the below error, however I am supplying an index...

ValueError: If using all scalar values, you must pass an index

Is there a way to fix this so that I get an output as per below? (column order doesn't matter)

    1C  2C  1B  2B  1A  2A
0   5   6   3   4   1   2
1   5   6   3   4   1   2
2   5   6   3   4   1   2
3   5   6   3   4   1   2
4   5   6   3   4   1   2

pd.__version__ == '1.0.5'

Upvotes: 1

Views: 225

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71687

You can horizontally stack the columns, then create a new dataframe and rename the columns:

df = pd.DataFrame(np.hstack(tst.values.T.tolist()))
df.columns = [f'{i}{c}' for c in tst for i in range(1,3)]

Alternatively you can concat along axis=1:

df = pd.concat([pd.DataFrame(tst[c].tolist()) for c in tst], axis=1)
df.columns = [f'{i}{c}' for c in tst for i in range(1,3)]

print(df)

   1A  2A  1B  2B  1C  2C
0   1   2   3   4   5   6
1   1   2   3   4   5   6
2   1   2   3   4   5   6
3   1   2   3   4   5   6
4   1   2   3   4   5   6

Upvotes: 1

antoine
antoine

Reputation: 672

If you don't mind to change applyto explode then this is one line solution. Kr.

res=pd.concat([pd.DataFrame(tst[[x]].explode(x).values.reshape(-1,2), columns=['1' + x, '2' + x]) for x in tst.columns], 1)
print(res)

Which returns:

  1A 2A 1B 2B 1C 2C
0  1  2  3  4  5  6
1  1  2  3  4  5  6
2  1  2  3  4  5  6
3  1  2  3  4  5  6
4  1  2  3  4  5  6

Upvotes: 1

Related Questions