Zozzoy
Zozzoy

Reputation: 63

Remove duplicates of list based on condition

Say, I have the following two lists:

list1 = ['A', 'A', 'B', 'B', 'C', 'D']
list2 = ['x', 'y', 'y', 'x', 'x', 'y']

I want to eliminate all duplicates of list1 and their corresponding elements in list2 based on the condition that the corresponding element of the duplicate in list2 is 'y'.

Expected outcome:

list1 = ['A', 'B', 'C', 'D']
list2 = ['y', 'y', 'x', 'y']

The final goal in the end to continue doing stuff based on the returned indices, for the example that would be for the example above:

index = [1, 2, 4, 5]

I tried solving this by using pandas:

df = pd.DataFrame(zip(list1, list2), columns=["l1", "l2"])
df = df[(~(df.duplicated(['l1']))) | (df.duplicated(['l1']) & df.l2.eq('y'))]

But this does not give me the correct result. Please note that I cannot refer to first or last element dropping, as 'x' and 'y' do not need to appear in the same order.

A solution with pandas would be fine, but is not necessary, a solution with list comprehension would be also fine.

Upvotes: 3

Views: 682

Answers (1)

mozway
mozway

Reputation: 260455

You could use:

# keep if: l1 is not duplicated     OR  l2 == "y"
df[~df['l1'].duplicated(keep=False) | df['l2'].eq('y')]

output:

  l1 l2
1  A  y
2  B  y
4  C  x
5  D  y

Upvotes: 3

Related Questions