Reputation: 607
How do I print/return the value based on values from another column?
analyse = input[['SR. NO', 'COUNTRY_NAME']]
print(analyse)
SR. NO COUNTRY_NAME
2 Norway
2 Denmark
2 Iceland
2 Finland
3 Denmark
3 Iceland
4 Finland
4 Norway
Here, I want to check if Norway or Denmark are present for every SR. NO, return those serial numbers where either one of these 2 countries aren't found! I tried using groupby and iterating over countries but that didn't help. I'm stuck at that point.
So, the expected output is:
[3,4]
Upvotes: 6
Views: 3855
Reputation: 863216
You can use set.issubset
for test if all values of list exist per groups:
L = ['Norway', 'Denmark']
s = set(L)
out = df.groupby('SR. NO')['COUNTRY_NAME'].apply(lambda x: s.issubset(x))
Thank you @yatu and @taras for improvement:
s = frozenset(L)
out = df.groupby('SR. NO')['COUNTRY_NAME'].apply(s.issubset)
Then filter index of only True
s values:
out = out.index[~out].tolist()
print (out)
[3, 4]
Another solution with filter in list comprehension:
L = ['Norway', 'Denmark']
s = set(L)
out = [k for k, v in df.groupby('SR. NO')['COUNTRY_NAME'].apply(set).items()
if not s.issubset(v)]
print (out)
[3, 4]
Upvotes: 3
Reputation: 88276
One approach would be to drop_duplicates
, groupby
SR.NO
, take the sum
of each group and check which is less than 2
(meaning that 0
or 1
distinct elements in the list are contained):
l = ['Norway', 'Denmark']
out = (df.drop_duplicates()
.COUNTRY_NAME.isin()
.groupby(df['SR. NO'])
.sum()
.lt(len(l)))
out.index[out].values.tolist()
# [3, 4]
Upvotes: 1