Reputation: 4657
I understand I can add a column to a dataframe and update its values to the values returned from a function, like this:
df=pd.DataFrame({'x':[1,2,3,4]})
def square(x):
return x*x
df['x_squared'] = [square(i) for i in df['x']]
However, I am facing a problem that the actual function is returning two items, and I want to put these two items in two different new columns. I wrote a pseudo-code here to describe my problem more clearly:
df=pd.DataFrame({'x':[1,2,3,4]})
def squareAndCube(x):
return x*x, x*x*x
#below is a pseudo-code
df['x_squared'], df['x_cubed'] = [squareAndCube(i) for i in df['x']]
Above codes give me an error message saying "too many values to unpack". So, how should I fix this?
Upvotes: 3
Views: 407
Reputation: 221574
You could do in a vectorized fashion, like so -
df['x_squared'], df['x_cubed'] = df.x**2,df.x**3
Or with that custom function, like so -
df['x_squared'], df['x_cubed'] = squareAndCube(df.x)
Back to your loopy case, on the right side of the assignment, you had :
In [101]: [squareAndCube(i) for i in df['x']]
Out[101]: [(1, 1), (4, 8), (9, 27), (16, 64)]
Now, on the left side, you had df['x_squared'], df['x_cubed'] =
. So, it's expecting the squared numbers of all the rows as the first input assignment. From the list shown above, the first element isn't that, it's actually the square and cube of the first row. So, the fix is to "transpose" that list and assign as the new columns. Thus, the fix would be -
In [102]: L = [squareAndCube(i) for i in df['x']]
In [103]: map(list, zip(*L)) # Transposed list
Out[103]: [[1, 4, 9, 16], [1, 8, 27, 64]]
In [104]: df['x_squared'], df['x_cubed'] = map(list, zip(*L))
For the love of NumPy broadcasting
!
df['x_squared'], df['x_cubed'] = (df.x.values[:,None]**[2,3]).T
Upvotes: 3
Reputation: 294318
This works for positive numbers. Thinking how to generalize but the brevity of this solution has me distracted.
df = pd.DataFrame(range(1, 10))
a = np.arange(1, 4).reshape(1, -1)
np.exp(np.log(df).dot(a))
Upvotes: 1
Reputation: 11347
How about using df.loc
like this:
df=pd.DataFrame({'x':[1,2,3,4]})
def square(x):
return x*x
df['x_squared'] = df['x_cubed'] = None
df.loc[:, ['x_squared', 'x_cubed']] = [squareAndCube(i) for i in df['x']]
gives
x x_squared x_cubed
0 1 1 1
1 2 4 8
2 3 9 27
3 4 16 64
This is very close to what you had, but the columns need to exist for df.loc
to work.
For the uninitiated, df.loc takes two parameters, a list of rows you want to work on - in this case :
which means all of them, and a list of columns - ['x_squared', 'x_cubed']
.
Upvotes: 0