Reputation: 2333
I have two dataframes each with one column. I'm pasting them exactly as they print below:
Top: (it has no column names as it is the result of a Top = Df1.groupby('col1')['att1'].diff().dropna()
1 15.566667
3 5.066667
5 57.266667
7 -10.366667
9 18.966667
11 50.966667
13 -5.633333
15 -14.266667
17 18.933333
19 3.100000
21 35.966667
23 -17.566667
25 -8.066667
27 -6.366667
29 7.133333
31 -2.633333
33 3.333333
35 -23.800000
37 2.333333
39 -53.533333
41 -17.300000
dtype: float64
Bottom: which is the result of Bottom = np.sqrt(Df2.groupby('ID')['Col2'].sum()/n)
ID
12868123 1.029001
757E13D7 1.432014
79731492 2.912770
799EFB29 1.826576
7D44062A 1.736757
7D4C0E2F 1.943503
7DBA169D 0.650023
7E558E2B 1.256287
7E8B3815 1.491974
7EB80123 0.558717
7FFB607D 1.505221
8065A321 1.809937
80EFE91B 2.064825
811F1B1E 0.992645
82B67C94 0.980618
833C27AE 0.969195
83957B28 0.469914
8447B85D 1.477168
84877498 0.872973
8569499D 2.215307
8617B7D9 1.033294
Name: Col2, dtype: float64
I want the divide those two columns values by each other.
Top/Bottom
I get the following:
1 NaN
3 NaN
5 NaN
7 NaN
9 NaN
11 NaN
13 NaN
15 NaN
17 NaN
19 NaN
21 NaN
23 NaN
25 NaN
27 NaN
29 NaN
31 NaN
33 NaN
35 NaN
37 NaN
39 NaN
41 NaN
12868123 NaN
757E13D7 NaN
79731492 NaN
799EFB29 NaN
7D44062A NaN
7D4C0E2F NaN
7DBA169D NaN
7E558E2B NaN
7E8B3815 NaN
7EB80123 NaN
7FFB607D NaN
8065A321 NaN
80EFE91B NaN
811F1B1E NaN
82B67C94 NaN
833C27AE NaN
83957B28 NaN
8447B85D NaN
84877498 NaN
8569499D NaN
8617B7D9 NaN
dtype: float64
I tried resetting the index column, it didn't help. Not sure why it's not working.
Upvotes: 1
Views: 4529
Reputation: 251
I arrived here because I was looking how to divide a column by a subset of itself.
I found a solution which is not reported here
Suppose you have a df like
d = {'mycol1':[0,0,1,1,2,2],'mycol2':[1,2,3,6,4,8]}
df = pd.DataFrame(data=d)
i.e.
mycol1 mycol2
0 0 1
1 0 2
2 1 3
3 1 6
4 2 4
5 2 8
And now you want to divide mycol2
for a subset composed by the first two values
df['mycol2'].div(df[df['mycol1']==0.0]['mycol2'])
will result in
0 1.0
1 1.0
2 NaN
3 NaN
4 NaN
5 NaN
because of the index problem reported by jezreal.
The solution is to simply use concat
to concatenate the subset to match the length of the original df.
Nrows = df[df['mycol1'==0.0]]['mycol2'].shape[0]
Nrows_tot = df['mycol2'].shape[0]
times_longer = int(Nrows_tot/Nrows)
df['mycol3'] = df['mycol2'].div(pd.concat([df[df['mycol1']==0.0]['mycol2']]*times_longer,ignore_index=True))
Upvotes: 0
Reputation: 863166
Problem is with different index
values, because arithmetic opearations align Series
by indices, so need cast to numpy array
by values
:
print (Top/Bottom.values)
1 15.127942
3 3.538141
5 19.660552
7 -5.675464
9 10.920737
11 26.224126
13 -8.666359
15 -11.356216
17 12.690123
19 5.548426
21 23.894609
23 -9.705679
25 -3.906707
27 -6.413841
29 7.274324
31 -2.717031
33 7.093496
35 -16.111911
37 2.672858
39 -24.165198
41 -16.742573
Name: col, dtype: float64
Solution with div
:
print (Top.div(Bottom.values))
1 15.127942
3 3.538141
5 19.660552
7 -5.675464
9 10.920737
11 26.224126
13 -8.666359
15 -11.356216
17 12.690123
19 5.548426
21 23.894609
23 -9.705679
25 -3.906707
27 -6.413841
29 7.274324
31 -2.717031
33 7.093496
35 -16.111911
37 2.672858
39 -24.165198
41 -16.742573
dtype: float64
But if assign one index
values to another, you can use:
Top.index = Bottom.index
print (Top/Bottom)
ID
12868123 15.127942
757E13D7 3.538141
79731492 19.660552
799EFB29 -5.675464
7D44062A 10.920737
7D4C0E2F 26.224126
7DBA169D -8.666359
7E558E2B -11.356216
7E8B3815 12.690123
7EB80123 5.548426
7FFB607D 23.894609
8065A321 -9.705679
80EFE91B -3.906707
811F1B1E -6.413841
82B67C94 7.274324
833C27AE -2.717031
83957B28 7.093496
8447B85D -16.111911
84877498 2.672858
8569499D -24.165198
8617B7D9 -16.742573
dtype: float64
And if get error
like:
ValueError: operands could not be broadcast together with shapes (20,) (21,)
problem is with different length of Series
.
Upvotes: 3