Reputation: 356
Here's an example of DF:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
2 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [-1, 0]
3 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, -1]
4 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
5 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
6 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
How to delete those rows where df['VNF'] = [-1, 0] or [0, -1] and df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])?
The expected result would be:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
2 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
3 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
4 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
Here's the constructor for the DataFrame:
data = {'EC1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [1, 0]],
'EC2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'CDC': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'L1': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L2': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L3': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L4': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]],
'L5': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'L6': [[0, 0], [0, 0], [0, 0], [0, 0], [0, 1], [0, 1], [0, 1]],
'VNF': [[1, 0], [0, 1], [-1, 0], [0, -1], [1, 0], [0, 1], [-1, 0]]}
Upvotes: 2
Views: 65
Reputation: 18315
List comprehension to find which indexes to drop might help see the conditions more directly:
columns = df.EC1, df.EC2, df.CDC, df.VNF
inds_to_drop = [iloc
for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
if vnf == [-1, 0] or vnf == [0, -1]
if all(val[idx] == 0
for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))]
new_df = df.drop(df.index[inds_to_drop])
to get
>>> new_df
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 VNF
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [1, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1]
4 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [1, 0]
5 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [0, 1]
6 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1] [-1, 0]
The list comprehension starts with the outer for loop over the column values and their integer indexes:
for iloc, (ec1, ec2, cdc, vnf) in enumerate(zip(*columns))
Then the first condition to drop kicks in:
df['VNF'] = [-1, 0] or [0, -1]
if vnf == [-1, 0] or vnf == [0, -1]
And the second condition:
df['EC1'], df['EC2'] and df['CDC'] has a value of 0 in the same index position as the -1 in df['VNF'])
if all(val[idx] == 0 for idx in (vnf.index(-1),) for val in (ec1, ec2, cdc))
Here, we check if all of the values of 3 columns satisfy the criterion. A trick here is 1-turn loop for idx in (vnf.index(-1),)
so as to evaluate the index of -1 only once (compare with val[vnf.index(-1)] for val in (ec1, ec2, cdc)
; less efficient).
Then the list is comprehended with the integer index locations of rows to drop:
>>> inds_to_drop
[2, 3]
If you have a RangeIndex, i.e., 0..N-1 kind of index, then you can directly say new_df = df.drop(inds_to_drop)
. But if custom index (e.g., ["a", "d", "e", "f"]
), we lookup the real index labels with df.index[inds_to_drop]
and then drop (would be "e", "f"
); this covers all cases.
Upvotes: 1
Reputation:
You can explode every column of df
, then identify the elements satisfying the first (sum of "VNF" values must be -1) and second condition and filter out the elements that satisfy both conditions to create temp
. Then since each cell must have two elements, you can count whether each index contains 2 elements by transforming count
, then filter the rows with two indices and groupby
the index and aggregate to list:
exploded = df.explode(df.columns.tolist())
first_cond = exploded.groupby(level=0)['VNF'].transform('sum').eq(-1)
second_cond = exploded['VNF'].eq(-1) & exploded['EC1'].eq(0) & exploded['EC2'].eq(0) & exploded['CDC'].eq(0)
temp = exploded[~(first_cond & second_cond)]
out = temp[temp.groupby(level=0)['VNF'].transform('count').gt(1)].groupby(level=0).agg(list).reset_index(drop=True)
Output:
EC1 EC2 CDC L1 L2 L3 L4 L5 L6 \
0 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0]
1 [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0] [0, 0]
2 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
3 [0, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
4 [1, 0] [0, 0] [0, 1] [0, 0] [0, 0] [0, 0] [0, 0] [0, 1] [0, 1]
VNF
0 [1, 0]
1 [0, 1]
2 [1, 0]
3 [0, 1]
4 [-1, 0]
Upvotes: 1