Reputation: 273
I have the following df:
| day | first mover |
| -------- | -------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 1 |
| 7 | 0 |
| 8 | 1 |
i want to group this Data frame in the order bottom to top with a frequency of 4 rows. Furthermore if first row of group is 1 make all other entries 0. Desired output:
| day | first mover |
| -------- | -------------- |
| 1 | 1 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
The first half i have accomplished. I am confuse about how to make other entries 0 if first entry in each group is 1.
N=4
(df.iloc[::-1].groupby(np.arange(len(df))//N
Upvotes: 1
Views: 79
Reputation: 262194
You could use a groupby.transform
:
group = (np.arange(len(df))//4)[::-1]
df['out'] = (df.groupby(group)['first mover']
.transform(lambda x: np.r_[x.iloc[0], np.zeros(len(x)-1, x.dtype)])
)
However, a much easier and efficient approach would be to slice every 4 row from the end (starting on the -4), then to reindex
with zeros:
df['out'] = (df['first mover'].iloc[-4::-4]
.reindex(df.index, fill_value=0)
)
or, using a boolean mask and where
:
# identify first of every 4 rows, from the end
m = np.arange(len(df)-1,-1,-1)%4 == 3
df['out'] = df['first mover'].where(m, 0)
Output:
day first mover out
0 1 1 1
1 2 1 0
2 3 0 0
3 4 0 0
4 5 0 0
5 6 1 0
6 7 0 0
7 8 1 0
Intermediates:
day first mover group iloc m out
0 1 1 1 1.0 True 1
1 2 1 1 NaN False 0
2 3 0 1 NaN False 0
3 4 0 1 NaN False 0
4 5 0 0 0.0 True 0
5 6 1 0 NaN False 0
6 7 0 0 NaN False 0
7 8 1 0 NaN False 0
Upvotes: 2
Reputation: 1166
import pandas as pd
import numpy as np
# Create sample DataFrames
df = pd.DataFrame(
{
"day": [*range(1, 21)],
"first mover": np.random.randint(0, 2, 20),
}
)
# if the day-1 (1, 5, 9, ...) is dividable by 4
# and
# the 'first mover' == 1
# result is 1 otherwise keep as 0
df['first mover edited'] = df.apply(lambda row: ( ( (row.day-1) % 4 == 0 ) and
( row['first mover'] == 1 ) )*1, axis=1)
df['group'] = (df['day']-1) // 4
df
| day | first mover | first mover edited | group |
|------:|--------------:|---------------------:|--------:|
| 1 | 1 | 1 | 0 |
| 2 | 0 | 0 | 0 |
| 3 | 1 | 0 | 0 |
| 4 | 0 | 0 | 0 |
| 5 | 1 | 1 | 1 |
| 6 | 1 | 0 | 1 |
| 7 | 0 | 0 | 1 |
| 8 | 1 | 0 | 1 |
| 9 | 0 | 0 | 2 |
| 10 | 0 | 0 | 2 |
| 11 | 0 | 0 | 2 |
| 12 | 0 | 0 | 2 |
| 13 | 1 | 1 | 3 |
| 14 | 0 | 0 | 3 |
| 15 | 1 | 0 | 3 |
| 16 | 0 | 0 | 3 |
| 17 | 0 | 0 | 4 |
| 18 | 1 | 0 | 4 |
| 19 | 0 | 0 | 4 |
| 20 | 0 | 0 | 4 |
you can edit the 'first mover' column this is for demonstration
Upvotes: 1
Reputation: 142985
I would use for
-loop for this
for name, group in df.groupby(...):
this way I could use if/else
to run or skip some code.
To get first element in group:
(I don't know why but .first()
doesn't work as I expected - it asks for some offset)
first_value = group.iloc[0]['first mover']
To get indexes of other rows (except first):
group.index[1:]
and use them to set 0 in original df
df.loc[group.index[1:], 'first mover'] = 0
Minimal working code which I used for tests:
import pandas as pd
df = pd.DataFrame({
'day': [1,2,3,4,5,6,7,8,],
'first mover': [1,1,0,0,0,1,0,1]
})
N = 4
for name, group in df.groupby(by=lambda index:index//N):
#print(f'\n---- group {name} ---\n')
#print(group)
first_value = group.iloc[0]['first mover']
#print('first value:', first_value)
if first_value == 1 :
#print('>>> change:', group.index[1:])
df.loc[group.index[1:], 'first mover'] = 0
print('\n--- df ---\n')
print(df)
Upvotes: 1