Reputation: 211
I am trying to select values from two different DataFrames based on certain column values and divide them with each other. If I try this I always get NaN values.
I added a simplified example below:
df = pd.DataFrame({'col1':['a','b','c','d'],
'col2':['x1','x2','x3','x4'],
'col3':[10,3,2,8]})
print(df)
df1 = pd.DataFrame({'col1':['a1','b1','c1','d1'],
'col2':['y1','y1','y3','y4'],
'col3':[5,4,1,6]})
print(df1)
a = df.loc[((df['col1']=='a')&(df['col2']=='x1')),'col3']
print(a)
b = df1.loc[((df1['col1']=='d1')&(df1['col2']=='y4')), 'col3']
print(b)
c = a/b
print(c)
How can I overcome this problem?
Upvotes: 0
Views: 346
Reputation: 42916
Pandas is so powerful because it aligns on the indices when doing certain actions, for example mathematical methods.
If we print both a
and b
in your example, we can see that their indices are respectively 0
and 3
:
print(a.index)
print(b.index)
Int64Index([0], dtype='int64')
Int64Index([3], dtype='int64')
This means that when doing the operation a/b
, pandas cannot align any values and thus returns NaN
.
Your solution would be to reset_index
:
a.reset_index(drop=True) / b.reset_index(drop=True)
0 1.666667
Name: col3, dtype: float64
Or cast to numpy
arrays so we lose the indices:
a.to_numpy() / b.to_numpy()
array([1.66666667])
Having that said, your operation seems not really logical and the problem lies probably deeper. Since the matching of the values does not make any sense right now.
Upvotes: 1