Reputation: 4797
I am trying to generate a 7th column in a dataframe:
arb_ser_num = 'zDfDD45'
predefined_number = 878
DATE Q1 Q2 Q3 Q4 Q5
0 2012-08-20 00:00:00 [Atlantic, Z, dEdd] None None None None
1 2012-08-21 00:00:00 [Pacific, Y, dEdd] None None None None
2 2012-08-22 00:00:00 [Indian, Y, dRdd] None None None None
3 2012-08-23 00:00:00 [Meditar, Z, dEdd] None None None None
4 2012-08-24 00:00:00 [Arctic, Z, dRdd] None None None None
df['Q6'] = df.apply(lambda row: get_q6(arb_ser_num, row, predefined_number), axis = 1)
Sometimes get_q6 will return [1,2,3,4,5] and other times it will return [None]. I keep getting the error:
Shape of passed values is (5,), indices imply (5, 6)
and I am not sure how to fix it. I found something similar here but I don't think it applies to me. I am trying to track ocean temperatures/currents.
Upvotes: 3
Views: 5363
Reputation: 1025
Solution, TL;DR
Make the function return equal number of elements as the number of columns in the original dataframe. So in this case, make get_q6
return 6 elements so the returned array's first row has exactly 6 elements.
Reason
Going thru the Pandas source code. In your case, original dataframe has shape implied=(5,6)
. So internals.construction_error()
inside Pandas tries to verify if the returned array after applying the function get_q6
has the same shape.
In the returned array, you have 5 rows as you are applying the func on each row. Now to find the column, it takes the first row of the returned array. If get_q6
had 6 elements, then it would verify that they both have the shape (5,6)
.
But in your case, the returned array has either 5 elements (when get_q6
returns [1,2,3,4,5]
) or just 1 (when get_q6
returns [None]
), NOT 6 elements as it wants. Probably, in the first row get_q6 returns
[None]. So the shape of the returned array is calculated as
passed=(5,1)`.
Finally, implied==passed
evaluates false and it throws an error.
Upvotes: 1
Reputation: 31
I also experienced this error. It turned out that the pandas Time Series data type was causing the problem. When I applied the function with the time expressed in epoch (or anything) success, but with the time converted to pandas Time Series, there was this error. So my suggestion would be to convert to Time Series after you apply the function, which obviously is contingent that you don't need your time variable in the function being applied.
*apply function not tested with pandas Time Spans.
Upvotes: 3