Reputation: 971
I have a numpy array as following:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
The array is called myArray, and I perform two indexing operations on the 2D array and get following results:
In[1]: a2 = myArray[1:]
a2
Out[1]:array([[3, 4],
[5, 6],
[7, 8]])
In[2]: a1 = myArray[:-1]
a1
Out[2]:array([[1, 2],
[3, 4],
[5, 6]])
Now, I perform numpy function to get following results:
In[]: theta = np.arccos((a1*a2).sum(axis= 1)/(np.sqrt((a1**2).sum(axis= 1)*(a2**2).sum(axis= 1))))
theta
Out[]: array([ 0.1798535 , 0.05123717, 0.02409172])
I perform the same sequence of operations on an equivalent data frame:
In[]: df = pd.DataFrame(data = myArray, columns = ["x", "y"])
df
Out[]:
x y
0 1 2
1 3 4
3 5 6
4 7 8
In[]: b2 = df[["x", "y"]].iloc[1:]
Out[]: b2
x y
1 3 4
2 5 6
3 7 8
In[]: b1 = df[["x", "y"]].iloc[:-1]
b1
Out[]:
x y
0 1 2
1 3 4
2 5 6
But now when I am trying to get theta for the data frame, I am only getting 0's and NaN values
In[]: theta2 = np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))
theta2
Out[]:
0 NaN
1 0.0
2 0.0
3 NaN
dtype: float64
Is it the right way I am applying the numpy functions to indexed data frames ? How should I get the same result for theta when applying it for data frame ?
UPDATE
As suggested below, using b1.values and b2.values works, but now when I am constructing a function, and applying it to the df, I keep getting value error:
def theta(group):
b2 = df[["x", "y"]].iloc[1:]
b1 = df[["x", "y"]].iloc[:-1]
t = np.arccos((b1.values*b2.values).sum(axis= 1)/
(np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))
return t
df2 = df.apply(theta)
This gives ValueError
ValueError: Shape of passed values is (2, 3), indices imply (2, 4)
Please let me know where I am wrong.
Thanks in advance.
Upvotes: 0
Views: 3376
Reputation: 19957
The index of b1 and b2 is not aligned.
If you do:
b2.index=b1.index
np.arccos((b1*b2).sum(axis= 1)/(np.sqrt((b1**2).sum(axis= 1)*(b2**2).sum(axis= 1))))
Should output:
Out[75]:
0 0.179853
1 0.051237
2 0.024092
dtype: float64
If you don't want to change index, you can call df.values explicitly:
np.arccos((b1.values*b2.values).sum(axis= 1)/(np.sqrt((b1.values**2).sum(axis= 1)*(b2.values**2).sum(axis= 1))))
Upvotes: 2