ValueError exception when creating a column from other columns in a DataFrame

Question

I am trying to create a column from two other columns in a DataFrame.
Consider the 3-column data frame:

import numpy as np
import pandas as pd

random_list_1 = np.random.randint(1, 10, 5)
random_list_2 = np.random.randint(1, 10, 5)
random_list_3 = np.random.randint(1, 10, 5)

df = pd.DataFrame({"p": random_list_1, "q": random_list_2, "r": random_list_3})

I create a new column from "p" and "q" with a function that will be given to apply.
As a simple example:

def operate(row):
    return [row['p'], row['q']]

Here,

df['s'] = df.apply(operate, axis = 1)

evaluates correctly and creates a column "s".

The issue appears when I am considering a data frame with a number of columns equal to the length of the list output by operate. So for instance with

df2 = pd.DataFrame({"p": random_list_1, "q": random_list_2})

evaluating this:

df2['s'] = df2.apply(operate, axis = 1)

throws a ValueError exception:

ValueError: Wrong number of items passed 2, placement implies 1

What is happening?

As a workaround, I could make operate return tuples (which does not throw an exception) and then convert them to lists, but for performance sake I would prefer getting lists in one reading only of the DataFrame.

Is there a way to achieve this?

rpanai · Accepted Answer

In both of the cases this work for me:

df["s"] = list(np.column_stack((df.p.values,df.q.values)))

Working with vectorized function is better than use apply. In this case the speed boost is 3x. See documentation

Anyway I found your question interesting and I'd like to know why this is happening.

ValueError exception when creating a column from other columns in a DataFrame

Answers (1)

Related Questions