Reputation: 591
I'm trying to implement an apply function that returns two values because the calculations are similar and pretty time consuming, so I don't want to do apply twice. The below is an MWE that is pretty stupid and I know there are easier ways to achieve what this MWE does. My actual function is more complicated, but I already run into an error with this MWE:
So, I got this to work:
def function(row):
return [row.A, row.A/2]
df = pd.DataFrame({'A' : np.random.randn(8),
'B' : np.random.randn(8)})
df[['D','E']] = df.apply(lambda row: function(row), axis=1).apply(pd.Series)
However, this does not:
df2 = pd.DataFrame({'A' : np.random.randn(8),
'B' : pd.date_range('1/1/2011', periods=8, freq='H'),
'C' : np.random.randn(8)})
df2[['D','E']] = df2.apply(lambda row: function(row), axis=1).apply(pd.Series)
Instead, it gives me ValueError: Shape of passed values is (8, 2), indices imply (8, 3)
I don't understand why changing the type of the B column would impact the outcome, it is not even used in the apply function at all?
I guess I could avoid this issue in the example by temporary excluding the date column. However, in my function later I will need to use the date.
Can someone explain me, why this example does not work? What changes by including a TS?
Upvotes: 3
Views: 747
Reputation: 294338
have function
return a pd.Series
instead. Returning a list is making apply try to fit the list into the existing row. Returning a pd.Series
convinces pandas of something different.
def function(row):
return pd.Series([row.A, row.A/2])
df2 = pd.DataFrame({'A' : np.random.randn(8),
'B' : pd.date_range('1/1/2011', periods=8, freq='H'),
'C' : np.random.randn(8)})
df2[['D','E']] = df2.apply(function, axis=1)
df2
Attempt to explain
s = pd.Series([1, 2, 3])
s
0 1
1 2
2 3
dtype: int64
s.loc[:] = [4, 5, 6]
s
0 4
1 5
2 6
dtype: int64
s.loc[:] = [7, 8]
ValueError: cannot set using a slice indexer with a different length than the value
Upvotes: 1