Chopin
Chopin

Reputation: 214

Drop rows using two conditionals - pandas

I've got a df below with numerous duplicate values. Using below, I'm aiming to drop rows where Value is unique compared to the previous rows and Group is equal to C.

Further, where this occurs I want to remove all previous duplicate rows.

d = {'Item': ["Red", "Red", "Red", "Green", "Green", "Red", "Red", "Red", "Green", "Green", "Green", "Green", "Red", "Red", "Red", "Green"],
     'Value': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6],
     'Group': ["A", "B", "B", "C", "D", "D", "A", "B", "C", "D", "E", "E", "B", "B", "D", "D"],
    }

df = pd.DataFrame(data=d)

mask = (df['Item'].isin(['Green'])) & (df.Value.eq(df.Value.shift(-1)))

df = df[~mask]

The input is:

     Item  Value Group
0     Red      1     A
1     Red      1     B
2     Red      1     B
3   Green      2     C
4   Green      2     D
5     Red      3     D
6     Red      3     A
7     Red      3     B
8   Green      4     C
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

intended output:

     Item  Value Group
0     Red      1     A
4   Green      2     D
5     Red      3     D
6     Red      3     A
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

current output:

     Item  Value Group
0     Red      1     A
1     Red      1     B
2     Red      1     B
4   Green      2     D
5     Red      3     D
6     Red      3     A
7     Red      3     B
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

Upvotes: 1

Views: 242

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24304

IIUC:

Another possible way:

out=df.shift(-1)
#shifting the dataframe
cond=df['Value'].ne(out['Value']) & out['Group'].eq('C') 
#your initial condition
df=df[~cond].drop(df[cond].index+1)
#filtering out data

output of df:

    Item    Value   Group
0   Red     1   A
3   Green   2   D
4   Red     3   E
5   Red     3   A
6   Red     3   B
7   Green   4   B
8   Green   4   D
9   Green   4   E
10  Green   4   A
11  Red     5   B
12  Red     5   C
13  Red     5   D
14  Green   6   E

Update:

IIUC:

Try:

create a custom function which will calculate your conditions and give the index of rows where the condition match

def getindicies():
    out=df.shift(-1)
    cond=df['Value'].ne(out['Value']) & out['Group'].eq('C') 
    idx=df.groupby('Group').filter(lambda x:x.duplicated(keep=False).any()).index
    to_drop=df.loc[:(df[cond].index+1)[-1]]
    return to_drop[to_drop.index.isin(idx)].index.tolist()+(df[cond].index+1).tolist()

#Finally call the function and remove those values:
df=df.drop(getindicies())

If you print df you will get your desired output

OR

let's extend the logic of this answer:

cond = df.Value.diff().ne(0) & df.Group.eq("C")
duplicates=df.groupby('Group').transform(lambda x:x.duplicated(keep=False).any()).all(1)
to_drop = (cond | cond.shift(-1)) | duplicates
df=df[~to_drop].loc[:df[cond].index[-1]].append(df.loc[df[cond].index[-1]+1:])
#OR
df=pd.concat([df[~to_drop].loc[:df[cond].index[-1]],df.loc[df[cond].index[-1]+1:]])

If you print df you will get your desired output

Upvotes: 1

wwnde
wwnde

Reputation: 26676

another way.

m=df['Value'].duplicated(keep='last')&df['Group'].ne('C')
n=(df['Group'].ne('C')&df['Group'].shift(-1).eq('C'))|(df['Group'].eq('C'))
df[~(m&n)]

Upvotes: 1

Mustafa Aydın
Mustafa Aydın

Reputation: 18296

# form the condition
cond = df.Value.diff().ne(0) & df.Group.eq("C")

# also consider the previous row
to_drop = cond | cond.shift(-1)

# index with inverse of the mask
new_df = df[~to_drop]

Locations where Value is not equal to the previous one can be found via looking at the difference not being 0:

df.Value.diff().ne(0)

and Group being "C" is found with

df.Group.eq("C")

Combining them with "and":

cond = df.Value.diff().ne(0) & df.Group.eq("C")

Since you want to drop the previous row too, we can "or" with the shifted version of this:

to_drop = cond | cond.shift(-1)

which gives

>>> to_drop

0     False
1      True
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False

Therefore final operation is indexing with inverse of this:

>>> new_df = df[~to_drop]
>>> new_df

     Item  Value Group
0     Red      1     A
3   Green      2     D
4     Red      3     E
5     Red      3     A
6     Red      3     B
7   Green      4     B
8   Green      4     D
9   Green      4     E
10  Green      4     A
11    Red      5     B
12    Red      5     C
13    Red      5     D
14  Green      6     E

Upvotes: 3

Related Questions