krm76
krm76

Reputation: 356

How to delete DF rows based on multiple column conditions?

Here's an example of DF:

        EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
2    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [-1, 0]
3    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, -1]
4    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
5    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
6    [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

How to delete those rows where df['VNF'] = [-1, 0] or [0, -1] and df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])?

The expected result would be:

        EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1    [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
2    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
3    [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
4    [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

Here's the constructor for the DataFrame:

data = {'EC1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [1, 0]],
 'EC2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'CDC': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'L1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L3': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L4': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
 'L5': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'L6': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
 'VNF': [[1, 0], [0, 1], [-1, 0], [0, -1], [1, 0], [0, 1], [-1, 0]]}

Upvotes: 2

Views: 65

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

List comprehension to find which indexes to drop might help see the conditions more directly:

columns = df.EC1, df.EC2, df.CDC, df.VNF

inds_to_drop = [iloc
                for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
                if vnf == [-1, 0] or vnf == [0, -1]
                if all(val[idx] == 0
                       for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))]

new_df = df.drop(df.index[inds_to_drop])

to get

>>> new_df

      EC1     EC2     CDC      L1      L2      L3      L4      L5      L6      VNF
0  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [1, 0]
1  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   [0, 1]
4  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [1, 0]
5  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   [0, 1]
6  [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]  [-1, 0]

The list comprehension starts with the outer for loop over the column values and their integer indexes:

for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))

Then the first condition to drop kicks in:

df['VNF'] = [-1, 0] or [0, -1]

if vnf == [-1, 0] or vnf == [0, -1]

And the second condition:

df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])

if all(val[idx] == 0 for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))

Here, we check if all of the values of 3 columns satisfy the criterion. A trick here is 1-turn loop for idx in (vnf.index(-1),) so as to evaluate the index of -1 only once (compare with val[vnf.index(-1)] for val in (ec1, ec2, cdc); less efficient).

Then the list is comprehended with the integer index locations of rows to drop:

>>> inds_to_drop
[2, 3]

If you have a RangeIndex, i.e., 0..N-1 kind of index, then you can directly say new_df = df.drop(inds_to_drop). But if custom index (e.g., ["a", "d", "e", "f"]), we lookup the real index labels with df.index[inds_to_drop] and then drop (would be "e", "f"); this covers all cases.

Upvotes: 1

user7864386
user7864386

Reputation:

You can explode every column of df, then identify the elements satisfying the first (sum of "VNF" values must be -1) and second condition and filter out the elements that satisfy both conditions to create temp. Then since each cell must have two elements, you can count whether each index contains 2 elements by transforming count, then filter the rows with two indices and groupby the index and aggregate to list:

exploded = df.explode(df.columns.tolist())
first_cond = exploded.groupby(level=0)['VNF'].transform('sum').eq(-1)
second_cond = exploded['VNF'].eq(-1) & exploded['EC1'].eq(0) & exploded['EC2'].eq(0) & exploded['CDC'].eq(0)

temp = exploded[~(first_cond & second_cond)]
out = temp[temp.groupby(level=0)['VNF'].transform('count').gt(1)].groupby(level=0).agg(list).reset_index(drop=True)

Output:

      EC1     EC2     CDC      L1      L2      L3      L4      L5      L6  \
0  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   
1  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 0]   
2  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   
3  [0, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   
4  [1, 0]  [0, 0]  [0, 1]  [0, 0]  [0, 0]  [0, 0]  [0, 0]  [0, 1]  [0, 1]   

       VNF  
0   [1, 0]  
1   [0, 1]  
2   [1, 0]  
3   [0, 1]  
4  [-1, 0]  

Upvotes: 1

Related Questions