si_mon
si_mon

Reputation: 211

Question: Dividing values from two different Pandas Dataframes selected based on column values returns NaN

I am trying to select values from two different DataFrames based on certain column values and divide them with each other. If I try this I always get NaN values.

I added a simplified example below:

df = pd.DataFrame({'col1':['a','b','c','d'],
                  'col2':['x1','x2','x3','x4'],
                   'col3':[10,3,2,8]})

print(df)

df1 = pd.DataFrame({'col1':['a1','b1','c1','d1'],
                  'col2':['y1','y1','y3','y4'],
                   'col3':[5,4,1,6]})

print(df1)

a = df.loc[((df['col1']=='a')&(df['col2']=='x1')),'col3']
print(a)

b = df1.loc[((df1['col1']=='d1')&(df1['col2']=='y4')), 'col3']
print(b)

c = a/b

print(c)

How can I overcome this problem?

Upvotes: 0

Views: 346

Answers (1)

Erfan
Erfan

Reputation: 42916

Pandas is so powerful because it aligns on the indices when doing certain actions, for example mathematical methods.

If we print both a and b in your example, we can see that their indices are respectively 0 and 3:

print(a.index)
print(b.index)

Int64Index([0], dtype='int64')
Int64Index([3], dtype='int64')

This means that when doing the operation a/b, pandas cannot align any values and thus returns NaN.

Your solution would be to reset_index:

a.reset_index(drop=True) / b.reset_index(drop=True)

0    1.666667
Name: col3, dtype: float64

Or cast to numpy arrays so we lose the indices:

a.to_numpy() / b.to_numpy()

array([1.66666667])

Having that said, your operation seems not really logical and the problem lies probably deeper. Since the matching of the values does not make any sense right now.

Upvotes: 1

Related Questions