Reputation: 493
I would like to remove the rows entirely when the column values of a specific column like user
is already present as an element of a list in another column. How can I best accommpish this?
user friend
0 jack [mary, jane, alex]
1 mary [kate, andrew, jensen]
2 alice [marina, catherine, howard]
3 andrew [syp, yuslina, john ]
4 catherine [yute, kelvin]
5 john [beyond, holand]
Expected Output:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
Upvotes: 1
Views: 73
Reputation: 260790
Your example seems incorrect, as either john should be kept (blacklist is made of all previous friends), or andrew should be removed (blacklist is only the previous list of friends).
Here are different options.
Remove is the used is present in:
S = set().union(*df['friend'])
mask = ~df['user'].isin(S)
# [False, True, False, True, True, True]
df[mask]
output:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
You can first compute an expanding set of friends, then check whether each user is in the set:
S = set()
# line below uses python ≥ 3.8, if older version use a classical loop
sets = [(S:=S.union(set(x))) for x in df['friend']]
mask = [u not in s for u,s in zip(df['user'], sets)]
# [True, False, True, False, False, False]
out = df[mask]
output:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
mask = [u not in s for u,s in zip(df['user'], df['friend'].agg(set).shift(fill_value={}))]
# [True, False, True, True, True, True]
out = df[mask]
output:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
3 andrew [syp, yuslina, john]
4 catherine [yute, kelvin]
5 john [beyond, holand]
used input:
d = {'user': ['jack', 'mary', 'alice', 'andrew', 'catherine', 'john'],
'friend': [['mary', 'jane', 'alex'],
['kate', 'andrew', 'jensen'],
['marina', 'catherine', 'howard'],
['syp', 'yuslina', 'john'],
['yute', 'kelvin'],
['beyond', 'holand']]}
df = pd.DataFrame(d)
Upvotes: 2
Reputation: 24049
You can convert the desired column to one list without any nested list. For this purpose you can use itertools.chain.from_iterable
then you can use pandas.isin
.
(andrew
exists in the [kate, andrew, jensen]
so this solution don't show this row too.)
import itertools
df = df[~df['user'].isin(list(itertools.chain.from_iterable(df['friend'])))]
Output:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
Upvotes: 2