Reputation: 149
I have the following dataset:
ID Group Name Information
1 A 'Info type1'
1 A 'Info type2'
2 B 'Info type2'
2 B 'Info type3'
2 B 'Info type4'
3 A 'Info type2'
3 A 'Info type5'
3 A 'Info type2'
Ultimately, I want to count, how many items have been processed by a specific group and group them by a specific Info type
.
In a first step I have defined a function to somehow filter the specific info type
:
def checkrejcted(strval):
if strval == 'Info type5':
return 'Rejected'
else:
return 'Not rejected'
In a next step, I have applied this function to the information
column:
dataset['CheckRejected'] = dataset['Information'].apply(checkrejcted)
Lastly, I have dropped duplicates, after dropping the information
column.
So the dataset looks like:
ID Group Name CheckRejected
1 A 'Not rejected'
2 B 'Not rejected'
3 A 'Not rejected'
3 A 'Rejected'
I am wondering, whether there is a smarter way to count how often a specific group name occurs and group it based on Not rejected
, Rejected
. It can happen, that specific items can have the information
Rejected
/Not rejected
at the same time. This is fine, as I assume that within the countplot this item will be counted for both.
Upvotes: 1
Views: 314
Reputation: 31011
You wrote that you want to count rows. So probably you need:
df.groupby(['Group Name', 'Information']).size()
For your sample data, the result is the following Series:
Group Name Information
A Info type1 1
Info type2 3
Info type5 1
B Info type2 1
Info type3 1
Info type4 1
dtype: int64
Its MultiIndex contains the grouping key (both levels) and the value is just the number of occurrences.
Dropping duplicetes does not do the job, as you loose the information how many times particular combination occurred.
Or if you want to count only Rejected / Not Rejected cases then:
The code to do it is:
df['Status'] = df.Information.apply(checkrejcted)
df.groupby(['Group Name', 'Status']).size()
Upvotes: 1
Reputation: 7222
You could use a map, and fillna with a default non matching action:
maps = { "'Info type5'": "'Rejected'" }
or
maps = { "'Info type1'": "'Not Rejected'", "'Info type2'": "'Not Rejected'" , "'Info type3'": "'Not Rejected'" , "'Info type4'": "'Not Rejected'", "'Info type5'": "'Rejected'" }
df['Information'].map(maps).fillna('Not Rejected')
0 'Not Rejected'
1 'Not Rejected'
2 'Not Rejected'
3 'Not Rejected'
4 'Not Rejected'
5 'Not Rejected'
6 'Rejected'
7 'Not Rejected'
df['CheckRejected'] = df['Information'].map(maps).fillna("'Not Rejected'")
ID Group Name Information CheckRejected
0 1 A 'Info type1' 'Not Rejected'
1 1 A 'Info type2' 'Not Rejected'
2 2 B 'Info type2' 'Not Rejected'
3 2 B 'Info type3' 'Not Rejected'
4 2 B 'Info type4' 'Not Rejected'
5 3 A 'Info type2' 'Not Rejected'
6 3 A 'Info type5' 'Rejected'
7 3 A 'Info type2' 'Not Rejected'
df.drop(columns='Information').drop_duplicates()
ID Group Name CheckRejected
0 1 A 'Not Rejected'
2 2 B 'Not Rejected'
5 3 A 'Not Rejected'
6 3 A 'Rejected'
Upvotes: 1