Remove rows when column values already present as an element of a list in another column

Question

I would like to remove the rows entirely when the column values of a specific column like user is already present as an element of a list in another column. How can I best accommpish this?

    user          friend
0   jack         [mary, jane, alex]
1   mary         [kate, andrew, jensen]
2   alice        [marina, catherine, howard]
3   andrew       [syp, yuslina, john ] 
4   catherine    [yute, kelvin]
5   john         [beyond, holand]

Expected Output:

    user                       friend
0   jack           [mary, jane, alex]
2  alice  [marina, catherine, howard]

mozway · Accepted Answer

Your example seems incorrect, as either john should be kept (blacklist is made of all previous friends), or andrew should be removed (blacklist is only the previous list of friends).

Here are different options.

Remove is the used is present in:

any set of friends

S = set().union(*df['friend'])

mask = ~df['user'].isin(S)
# [False, True, False, True, True, True]

df[mask]

output:

    user                       friend
0   jack           [mary, jane, alex]
2  alice  [marina, catherine, howard]

all previous sets of friends

You can first compute an expanding set of friends, then check whether each user is in the set:

S = set()
# line below uses python ≥ 3.8, if older version use a classical loop
sets = [(S:=S.union(set(x))) for x in df['friend']]

mask = [u not in s for u,s in zip(df['user'], sets)]
# [True, False, True, False, False, False]
out = df[mask]

output:

    user                       friend
0   jack           [mary, jane, alex]
2  alice  [marina, catherine, howard]

only previous set of friends

mask = [u not in s for u,s in zip(df['user'], df['friend'].agg(set).shift(fill_value={}))]
# [True, False, True, True, True, True]

out = df[mask]

output:

        user                       friend
0       jack           [mary, jane, alex]
2      alice  [marina, catherine, howard]
3     andrew         [syp, yuslina, john]
4  catherine               [yute, kelvin]
5       john             [beyond, holand]

used input:

d = {'user': ['jack', 'mary', 'alice', 'andrew', 'catherine', 'john'],
     'friend': [['mary', 'jane', 'alex'], 
                ['kate', 'andrew', 'jensen'],
                ['marina', 'catherine', 'howard'],
                ['syp', 'yuslina', 'john'],
                ['yute', 'kelvin'],
                ['beyond', 'holand']]}
df = pd.DataFrame(d)

Remove rows when column values already present as an element of a list in another column

Answers (2)

any set of friends

all previous sets of friends

only previous set of friends

Related Questions