Yash Ghorpade
Yash Ghorpade

Reputation: 607

Checking if dataframe groups contain values from list

How do I print/return the value based on values from another column?

analyse = input[['SR. NO', 'COUNTRY_NAME']]
print(analyse)

SR. NO    COUNTRY_NAME
     2          Norway
     2         Denmark
     2         Iceland
     2         Finland
     3         Denmark
     3         Iceland
     4         Finland
     4          Norway

Here, I want to check if Norway or Denmark are present for every SR. NO, return those serial numbers where either one of these 2 countries aren't found! I tried using groupby and iterating over countries but that didn't help. I'm stuck at that point.

So, the expected output is:

[3,4]

Upvotes: 6

Views: 3855

Answers (2)

jezrael
jezrael

Reputation: 863216

You can use set.issubset for test if all values of list exist per groups:

L = ['Norway', 'Denmark']
s = set(L)
out = df.groupby('SR. NO')['COUNTRY_NAME'].apply(lambda x: s.issubset(x))

Thank you @yatu and @taras for improvement:

s = frozenset(L)
out = df.groupby('SR. NO')['COUNTRY_NAME'].apply(s.issubset)

Then filter index of only Trues values:

out = out.index[~out].tolist()
print (out)
[3, 4]

Another solution with filter in list comprehension:

L = ['Norway', 'Denmark']
s = set(L)
out =  [k for k, v in df.groupby('SR. NO')['COUNTRY_NAME'].apply(set).items() 
               if not s.issubset(v)]
print (out)
[3, 4]

Upvotes: 3

yatu
yatu

Reputation: 88276

One approach would be to drop_duplicates, groupby SR.NO, take the sum of each group and check which is less than 2 (meaning that 0 or 1 distinct elements in the list are contained):

l = ['Norway', 'Denmark']
out = (df.drop_duplicates()
         .COUNTRY_NAME.isin()
         .groupby(df['SR. NO'])
         .sum() 
         .lt(len(l)))
out.index[out].values.tolist()
# [3, 4]

Upvotes: 1

Related Questions