Reputation: 495
I want to use every 5th row as a reference row (ref_row
), divide this ref_row
starting from this ref_row
and do the same for the next 4 rows.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
len = df.shape[0]
for idx in range(0,len,5):
ref_row = df.iloc[idx:idx+1,:]
for idx_next in range(idx,idx+5):
df.iloc[idx_next:idx_next+1,:] = df.iloc[idx_next:idx_next+1,:].div(ref_row)
However, I got all NaN
except the ref_row
.
A B C D
0 1.0 1.0 1.0 1.0
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
... ... ... ... ...
95 1.0 1.0 1.0 1.0
96 NaN NaN NaN NaN
97 NaN NaN NaN NaN
98 NaN NaN NaN NaN
99 NaN NaN NaN NaN
Any idea what's wrong?
Upvotes: 0
Views: 1302
Reputation:
The problem with your code is that with df.iloc[idx_next:idx_next+1,:]
and df.iloc[idx:idx+1,:]
, you're indexing df rows as DF objects. So when you divide, the indices don't match and you get NaN. Replace
df.iloc[idx_next:idx_next+1,:]
with
df.iloc[idx_next]
and
df.iloc[idx:idx+1,:]
with
df.iloc[idx]
everywhere, it will work as expected (because they're now Series objects, so the indices match).
You can also repeat the array of every fifth row of the DataFrame using np.repeat
on axis=0
, then element-wise divide it with the resulting array:
out = df.div(np.repeat(df[::5].to_numpy(), 5, axis=0))
Output:
A B C D
0 1.000000 1.000000 1.000000 1.000000
1 0.726190 0.359375 0.967742 1.644068
2 0.130952 0.046875 0.161290 0.406780
3 0.488095 0.312500 0.919355 0.305085
4 0.857143 0.203125 0.967742 0.525424
.. ... ... ... ...
95 1.000000 1.000000 1.000000 1.000000
96 0.061224 1.400000 0.518519 0.882353
97 1.510204 1.300000 1.740741 5.588235
98 0.224490 2.100000 1.407407 0.294118
99 1.061224 1.400000 1.388889 3.411765
[100 rows x 4 columns]
Upvotes: 2