user3768495
user3768495

Reputation: 4657

How to iterate through a column in dataframe and update two new columns simultaneously?

I understand I can add a column to a dataframe and update its values to the values returned from a function, like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = [square(i) for i in df['x']]

However, I am facing a problem that the actual function is returning two items, and I want to put these two items in two different new columns. I wrote a pseudo-code here to describe my problem more clearly:

df=pd.DataFrame({'x':[1,2,3,4]})

def squareAndCube(x):
    return x*x, x*x*x

#below is a pseudo-code
df['x_squared'], df['x_cubed'] = [squareAndCube(i) for i in df['x']]

Above codes give me an error message saying "too many values to unpack". So, how should I fix this?

Upvotes: 3

Views: 407

Answers (3)

Divakar
Divakar

Reputation: 221574

You could do in a vectorized fashion, like so -

df['x_squared'], df['x_cubed'] = df.x**2,df.x**3

Or with that custom function, like so -

df['x_squared'], df['x_cubed'] = squareAndCube(df.x)

Back to your loopy case, on the right side of the assignment, you had :

In [101]: [squareAndCube(i) for i in df['x']]
Out[101]: [(1, 1), (4, 8), (9, 27), (16, 64)]

Now, on the left side, you had df['x_squared'], df['x_cubed'] =. So, it's expecting the squared numbers of all the rows as the first input assignment. From the list shown above, the first element isn't that, it's actually the square and cube of the first row. So, the fix is to "transpose" that list and assign as the new columns. Thus, the fix would be -

In [102]: L = [squareAndCube(i) for i in df['x']]

In [103]: map(list, zip(*L))  # Transposed list
Out[103]: [[1, 4, 9, 16], [1, 8, 27, 64]]

In [104]: df['x_squared'], df['x_cubed'] = map(list, zip(*L))

For the love of NumPy broadcasting!

df['x_squared'], df['x_cubed'] = (df.x.values[:,None]**[2,3]).T

Upvotes: 3

piRSquared
piRSquared

Reputation: 294318

This works for positive numbers. Thinking how to generalize but the brevity of this solution has me distracted.

df = pd.DataFrame(range(1, 10))
a = np.arange(1, 4).reshape(1, -1)

np.exp(np.log(df).dot(a))

enter image description here

Upvotes: 1

Matthew
Matthew

Reputation: 11347

How about using df.loc like this:

df=pd.DataFrame({'x':[1,2,3,4]})

def square(x):
    return x*x

df['x_squared'] = df['x_cubed'] = None
df.loc[:, ['x_squared', 'x_cubed']] = [squareAndCube(i) for i in df['x']]

gives

   x  x_squared  x_cubed
0  1          1        1
1  2          4        8
2  3          9       27
3  4         16       64

This is very close to what you had, but the columns need to exist for df.loc to work.

For the uninitiated, df.loc takes two parameters, a list of rows you want to work on - in this case : which means all of them, and a list of columns - ['x_squared', 'x_cubed'].

Upvotes: 0

Related Questions