BKS
BKS

Reputation: 2333

Division in pandas not working as it should

I have two dataframes each with one column. I'm pasting them exactly as they print below:

Top: (it has no column names as it is the result of a Top = Df1.groupby('col1')['att1'].diff().dropna()

1     15.566667
3      5.066667
5     57.266667
7    -10.366667
9     18.966667
11    50.966667
13    -5.633333
15   -14.266667
17    18.933333
19     3.100000
21    35.966667
23   -17.566667
25    -8.066667
27    -6.366667
29     7.133333
31    -2.633333
33     3.333333
35   -23.800000
37     2.333333
39   -53.533333
41   -17.300000
dtype: float64

Bottom: which is the result of Bottom = np.sqrt(Df2.groupby('ID')['Col2'].sum()/n)

ID
12868123    1.029001
757E13D7    1.432014
79731492    2.912770
799EFB29    1.826576
7D44062A    1.736757
7D4C0E2F    1.943503
7DBA169D    0.650023
7E558E2B    1.256287
7E8B3815    1.491974
7EB80123    0.558717
7FFB607D    1.505221
8065A321    1.809937
80EFE91B    2.064825
811F1B1E    0.992645
82B67C94    0.980618
833C27AE    0.969195
83957B28    0.469914
8447B85D    1.477168
84877498    0.872973
8569499D    2.215307
8617B7D9    1.033294
Name: Col2, dtype: float64

I want the divide those two columns values by each other.

Top/Bottom

I get the following:

1          NaN
3          NaN
5          NaN
7          NaN
9          NaN
11         NaN
13         NaN
15         NaN
17         NaN
19         NaN
21         NaN
23         NaN
25         NaN
27         NaN
29         NaN
31         NaN
33         NaN
35         NaN
37         NaN
39         NaN
41         NaN
12868123   NaN
757E13D7   NaN
79731492   NaN
799EFB29   NaN
7D44062A   NaN
7D4C0E2F   NaN
7DBA169D   NaN
7E558E2B   NaN
7E8B3815   NaN
7EB80123   NaN
7FFB607D   NaN
8065A321   NaN
80EFE91B   NaN
811F1B1E   NaN
82B67C94   NaN
833C27AE   NaN
83957B28   NaN
8447B85D   NaN
84877498   NaN
8569499D   NaN
8617B7D9   NaN
dtype: float64

I tried resetting the index column, it didn't help. Not sure why it's not working.

Upvotes: 1

Views: 4529

Answers (2)

brodegon
brodegon

Reputation: 251

I arrived here because I was looking how to divide a column by a subset of itself.

I found a solution which is not reported here

Suppose you have a df like

d = {'mycol1':[0,0,1,1,2,2],'mycol2':[1,2,3,6,4,8]}
df = pd.DataFrame(data=d)

i.e.

   mycol1  mycol2
0       0       1
1       0       2
2       1       3
3       1       6
4       2       4
5       2       8

And now you want to divide mycol2 for a subset composed by the first two values

df['mycol2'].div(df[df['mycol1']==0.0]['mycol2'])

will result in

0    1.0
1    1.0
2    NaN
3    NaN
4    NaN
5    NaN

because of the index problem reported by jezreal.

The solution is to simply use concat to concatenate the subset to match the length of the original df.

Nrows = df[df['mycol1'==0.0]]['mycol2'].shape[0]
Nrows_tot = df['mycol2'].shape[0]
times_longer = int(Nrows_tot/Nrows)
df['mycol3'] = df['mycol2'].div(pd.concat([df[df['mycol1']==0.0]['mycol2']]*times_longer,ignore_index=True))

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Problem is with different index values, because arithmetic opearations align Series by indices, so need cast to numpy array by values:

print (Top/Bottom.values)
1     15.127942
3      3.538141
5     19.660552
7     -5.675464
9     10.920737
11    26.224126
13    -8.666359
15   -11.356216
17    12.690123
19     5.548426
21    23.894609
23    -9.705679
25    -3.906707
27    -6.413841
29     7.274324
31    -2.717031
33     7.093496
35   -16.111911
37     2.672858
39   -24.165198
41   -16.742573
Name: col, dtype: float64

Solution with div:

print (Top.div(Bottom.values))
1     15.127942
3      3.538141
5     19.660552
7     -5.675464
9     10.920737
11    26.224126
13    -8.666359
15   -11.356216
17    12.690123
19     5.548426
21    23.894609
23    -9.705679
25    -3.906707
27    -6.413841
29     7.274324
31    -2.717031
33     7.093496
35   -16.111911
37     2.672858
39   -24.165198
41   -16.742573
dtype: float64

But if assign one index values to another, you can use:

Top.index = Bottom.index
print (Top/Bottom)
ID
12868123    15.127942
757E13D7     3.538141
79731492    19.660552
799EFB29    -5.675464
7D44062A    10.920737
7D4C0E2F    26.224126
7DBA169D    -8.666359
7E558E2B   -11.356216
7E8B3815    12.690123
7EB80123     5.548426
7FFB607D    23.894609
8065A321    -9.705679
80EFE91B    -3.906707
811F1B1E    -6.413841
82B67C94     7.274324
833C27AE    -2.717031
83957B28     7.093496
8447B85D   -16.111911
84877498     2.672858
8569499D   -24.165198
8617B7D9   -16.742573
dtype: float64

And if get error like:

ValueError: operands could not be broadcast together with shapes (20,) (21,)

problem is with different length of Series.

Upvotes: 3

Related Questions