prem
prem

Reputation: 273

Pandas groupby make all elements 0 if first element is 1

I have the following df:

| day      | first mover    |
| -------- | -------------- |
| 1        |     1        |
| 2        |     1        |
| 3        |     0        |
| 4        |     0        |
| 5        |     0        |
| 6        |     1        |
| 7        |     0        |
| 8        |     1        |

i want to group this Data frame in the order bottom to top with a frequency of 4 rows. Furthermore if first row of group is 1 make all other entries 0. Desired output:

| day      | first mover    |
| -------- | -------------- |
| 1        |     1        |
| 2        |     0        |
| 3        |     0        |
| 4        |     0        |
| 5        |     0        |
| 6        |     0        |
| 7        |     0        |
| 8        |     0        |

The first half i have accomplished. I am confuse about how to make other entries 0 if first entry in each group is 1.

N=4
(df.iloc[::-1].groupby(np.arange(len(df))//N

Upvotes: 1

Views: 79

Answers (3)

mozway
mozway

Reputation: 262194

You could use a groupby.transform:

group = (np.arange(len(df))//4)[::-1]
df['out'] = (df.groupby(group)['first mover']
               .transform(lambda x: np.r_[x.iloc[0], np.zeros(len(x)-1, x.dtype)])
             )

However, a much easier and efficient approach would be to slice every 4 row from the end (starting on the -4), then to reindex with zeros:

df['out'] = (df['first mover'].iloc[-4::-4]
             .reindex(df.index, fill_value=0)
             )

or, using a boolean mask and where:

# identify first of every 4 rows, from the end
m = np.arange(len(df)-1,-1,-1)%4 == 3
df['out'] = df['first mover'].where(m, 0)

Output:

   day  first mover  out
0    1            1    1
1    2            1    0
2    3            0    0
3    4            0    0
4    5            0    0
5    6            1    0
6    7            0    0
7    8            1    0

Intermediates:

   day  first mover  group  iloc      m  out
0    1            1      1   1.0   True    1
1    2            1      1   NaN  False    0
2    3            0      1   NaN  False    0
3    4            0      1   NaN  False    0
4    5            0      0   0.0   True    0
5    6            1      0   NaN  False    0
6    7            0      0   NaN  False    0
7    8            1      0   NaN  False    0

Upvotes: 2

Hanna
Hanna

Reputation: 1166

import pandas as pd
import numpy as np

# Create sample DataFrames
df = pd.DataFrame(
    {
        "day": [*range(1, 21)],
        "first mover": np.random.randint(0, 2, 20),
    }
)

# if the day-1 (1, 5, 9, ...) is dividable by 4
# and 
# the 'first mover' == 1
# result is 1 otherwise keep as 0

df['first mover edited'] = df.apply(lambda row: ( ( (row.day-1) % 4 == 0 ) and 
( row['first mover'] == 1 ) )*1, axis=1)

df['group'] = (df['day']-1) // 4

df
|   day |   first mover |   first mover edited |   group |
|------:|--------------:|---------------------:|--------:|
|     1 |             1 |                    1 |       0 |
|     2 |             0 |                    0 |       0 |
|     3 |             1 |                    0 |       0 |
|     4 |             0 |                    0 |       0 |
|     5 |             1 |                    1 |       1 |
|     6 |             1 |                    0 |       1 |
|     7 |             0 |                    0 |       1 |
|     8 |             1 |                    0 |       1 |
|     9 |             0 |                    0 |       2 |
|    10 |             0 |                    0 |       2 |
|    11 |             0 |                    0 |       2 |
|    12 |             0 |                    0 |       2 |
|    13 |             1 |                    1 |       3 |
|    14 |             0 |                    0 |       3 |
|    15 |             1 |                    0 |       3 |
|    16 |             0 |                    0 |       3 |
|    17 |             0 |                    0 |       4 |
|    18 |             1 |                    0 |       4 |
|    19 |             0 |                    0 |       4 |
|    20 |             0 |                    0 |       4 |

you can edit the 'first mover' column this is for demonstration

Upvotes: 1

furas
furas

Reputation: 142985

I would use for-loop for this

for name, group in df.groupby(...):

this way I could use if/else to run or skip some code.

To get first element in group:
(I don't know why but .first() doesn't work as I expected - it asks for some offset)

first_value = group.iloc[0]['first mover']

To get indexes of other rows (except first):

group.index[1:]

and use them to set 0 in original df

df.loc[group.index[1:], 'first mover'] = 0

Minimal working code which I used for tests:

import pandas as pd

df = pd.DataFrame({
         'day': [1,2,3,4,5,6,7,8,], 
         'first mover': [1,1,0,0,0,1,0,1]
     })
     
N = 4

for name, group in df.groupby(by=lambda index:index//N):
    #print(f'\n---- group {name} ---\n')
    #print(group)

    first_value = group.iloc[0]['first mover']
    #print('first value:', first_value)
    
    if first_value == 1 :
        #print('>>> change:', group.index[1:])
        df.loc[group.index[1:], 'first mover'] = 0
        
print('\n--- df ---\n')        
print(df)        

Upvotes: 1

Related Questions