Filtering top n values in pandas

Question

I have the following dataset:

ID   Group Name   Information
1    A            'Info type1'
1    A            'Info type2' 
2    B            'Info type2' 
2    B            'Info type3' 
2    B            'Info type4'
3    A            'Info type2' 
3    A            'Info type5'
3    A            'Info type2'

Ultimately, I want to count, how many items have been processed by a specific group and group them by a specific Info type.

In a first step I have defined a function to somehow filter the specific info type:

def checkrejcted(strval):
    if strval == 'Info type5':
        return 'Rejected'
    else:
        return 'Not rejected'

In a next step, I have applied this function to the information column:

dataset['CheckRejected'] = dataset['Information'].apply(checkrejcted)

Lastly, I have dropped duplicates, after dropping the information column. So the dataset looks like:

ID   Group Name   CheckRejected
1    A            'Not rejected'
2    B            'Not rejected' 
3    A            'Not rejected'
3    A            'Rejected'

I am wondering, whether there is a smarter way to count how often a specific group name occurs and group it based on Not rejected, Rejected. It can happen, that specific items can have the information Rejected/Not rejected at the same time. This is fine, as I assume that within the countplot this item will be counted for both.

oppressionslayer · Accepted Answer

You could use a map, and fillna with a default non matching action:

maps = { "'Info type5'": "'Rejected'" } 
or
maps = { "'Info type1'": "'Not Rejected'",   "'Info type2'": "'Not Rejected'" ,  "'Info type3'": "'Not Rejected'" ,  "'Info type4'": "'Not Rejected'", "'Info type5'": "'Rejected'"  } 

df['Information'].map(maps).fillna('Not Rejected')                                                                                                                                 

0    'Not Rejected'
1    'Not Rejected'
2    'Not Rejected'
3    'Not Rejected'
4    'Not Rejected'
5    'Not Rejected'
6        'Rejected'
7    'Not Rejected'

df['CheckRejected'] = df['Information'].map(maps).fillna("'Not Rejected'")

   ID Group Name   Information   CheckRejected
0   1          A  'Info type1'  'Not Rejected'
1   1          A  'Info type2'  'Not Rejected'
2   2          B  'Info type2'  'Not Rejected'
3   2          B  'Info type3'  'Not Rejected'
4   2          B  'Info type4'  'Not Rejected'
5   3          A  'Info type2'  'Not Rejected'
6   3          A  'Info type5'      'Rejected'
7   3          A  'Info type2'  'Not Rejected'

df.drop(columns='Information').drop_duplicates()

   ID Group Name   CheckRejected
0   1          A  'Not Rejected'
2   2          B  'Not Rejected'
5   3          A  'Not Rejected'
6   3          A      'Rejected'

Filtering top n values in pandas

Answers (2)

Related Questions