rwolst
rwolst

Reputation: 13662

Use cumcount on pandas dataframe with a conditional increment

Consider the dataframe

df = pd.DataFrame(
    [
        ['A', 1],
        ['A', 1],
        ['B', 1],
        ['B', 0],
        ['A', 0],
        ['A', 1],
        ['B', 1]
    ], columns = ['key', 'cond'])

I want to find a cumulative (running) count (starting at 1) for each key, where we only increment if the previous value in the group had cond == 1. When appended to the above dataframe this would give

df_result = pd.DataFrame(
    [
        ['A', 1, 1],
        ['A', 1, 2],
        ['B', 1, 1],
        ['B', 0, 2],
        ['A', 0, 3],
        ['A', 1, 3],
        ['B', 1, 2]
    ], columns = ['key', 'cond'])

Note that essentially the cond values of the last rows in each key group have no effect.

Just doing a simple group and cumcount

df.groupby('key').cumcount()

of course doesn't account for the cond value of the previous element. How can I take this into account?

EDIT

As some of the solutions below don't work on some edge cases, I will give a more comprehensive dataframe for testing.

df = pd.DataFrame(
    [
        ['A', 0],
        ['A', 1],
        ['A', 1],
        ['B', 1],
        ['B', 0],
        ['A', 0],
        ['A', 1],
        ['B', 1],
        ['B', 0]
    ], columns = ['key', 'cond'])

which when appending the true result should give

df_result = pd.DataFrame(
    [
        ['A', 0, 1],
        ['A', 1, 1],
        ['A', 1, 2],
        ['B', 1, 1],
        ['B', 0, 2],
        ['A', 0, 3],
        ['A', 1, 3],
        ['B', 1, 2],
        ['B', 0, 3]
    ], columns = ['key', 'cond'])

Upvotes: 5

Views: 11160

Answers (3)

cs95
cs95

Reputation: 402323

Use groupby with combination of shift and cumsum.

df['new'] = df.groupby('key').cond.apply(
        lambda x: x.shift().fillna(1).cumsum()
).astype(int)

df
  key  cond  new
0   A     1    1
1   A     1    2
2   B     1    1
3   B     0    2
4   A     0    3
5   A     1    3
6   B     1    2

Upvotes: 8

jezrael
jezrael

Reputation: 862511

Use groupby with custom lambda function with shift for previous values, replace first NaN by back filling if possible first value is 0 per key and last cumsum with convert to int:

df['new'] = df.groupby('key')['cond'].apply(lambda x: x.shift().bfill().cumsum()).astype(int)

print (df)
  key  cond  new
0   A     1    1
1   A     1    2
2   B     1    1
3   B     0    2
4   A     0    3
5   A     1    3
6   B     1    2

Upvotes: 2

Amadan
Amadan

Reputation: 198314

df['cnt'] = df[df["cond"]==1].groupby('key').cumcount()+1
df['cnt'] = df.groupby('key')['cnt'].fillna(method='bfill')
df
# =>   key  cond  cnt
#    0   A     1  1.0
#    1   A     1  2.0
#    2   B     1  1.0
#    3   B     0  2.0
#    4   A     0  3.0
#    5   A     1  3.0
#    6   B     1  2.0

Upvotes: 4

Related Questions