Reputation: 1663

Pandas replace all but first in consecutive group

The problem description is simple, but I cannot figure how to make this work in Pandas. Basically, I'm trying to replace consecutive values (except the first) with some replacement value. For example:

data = {
    "A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}

df = pd.DataFrame.from_dict(data)


    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   2
10  2
11  2
12  3

If I run this through some function foo(df, 2, 0) I would get the following:

Which replaces all values of 2 with 0, except for the first one. Is this possible?

Upvotes: 0

Answers (4)

Scott Boston

Reputation: 153500

Try, if 'A' is duplicated further down the datafame, an is monotonic increasing:

def foo(df, val=2, repl=0):
  return df.mask((df.groupby('A').transform('cumcount') > 0) & (df['A'] == val), repl)

foo(df, 2, 0)

Output:

Upvotes: 1

jhso

Reputation: 3283

I've managed a solution to this problem by shifting the row down by one and checking to see if the values align. Also included a function which can take multiple values to check for (not just 2).

import pandas as pd
data = {
    "A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}

df = pd.DataFrame(data)
def replace_recurring(df,key,offset=1,values=[2]): 
    df['offset'] = df[key].shift(offset) 
    df.loc[(df[key]==df['offset']) & (df[key].isin(values)),key] = 0 
    df = df.drop(['offset'],axis=1) 
    return df 
df = replace_recurring(df,'A',offset=1,values=[2])

Giving the output:

Upvotes: 0

Nick

Reputation: 147206

You can find all the rows where A = 2 and A is also equal to the previous A value and set them to 0:

data = {
    "A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}

df = pd.DataFrame.from_dict(data)
df[(df.A == 2) & (df.A == df.A.shift(1))] = 0

Output:

If you have more than one column in the dataframe, use df.loc to just set the A values:

df.loc[(df.A == 2) & (df.A == df.A.shift(1)), 'A'] = 0

Upvotes: 2

TheFaultInOurStars

Reputation: 3608

I'm not sure if this is the best way, but I came up with this solution, hope to be helpful:

import pandas as pd
data = {
    "A": [0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 3]
}

df = pd.DataFrame(data)
def replecate(df, number, replacement):
    i = 1 
    for column in df.columns:
        for index,value in enumerate(df[column]):
            if i == 1 and value == number :
                i = 0
            elif value == number and i != 1:
                df[column][index] = replacement
        i = 1
    return df 

replecate(df, 2 , 0)

Output

Upvotes: 0

Pandas replace all but first in consecutive group

Answers (4)

Related Questions