Reputation: 23
I have a pandas dataframe like this:
ID | Tier1 | Tier2 |
---|---|---|
1111 | RF | B |
1111 | OK | B |
2222 | RF | B |
2222 | RF | E |
3333 | OK | B |
3333 | LO | B |
I need to cut down the table so the IDs are unique, but do so with the following hierarchy: RF>OK>LO for Tier1. Then B>E for Tier2.
So the expected output will be:
ID | Tier1 | Tier2 |
---|---|---|
1111 | RF | B |
2222 | RF | B |
2222 | RF | E |
3333 | OK | B |
then:
ID | Tier1 | Tier2 |
---|---|---|
1111 | RF | B |
2222 | RF | B |
3333 | OK | B |
I am struggling to figure out how to this. My initial attempt is to group the table with grouped = df.groupby('ID')
and then:
grouped = df.groupby('ID')
for key, group in grouped:
check_rf = group['Tier1']=='RF'
check_ok = group['Tier1']=='OK'
if check_rf.any():
group = group[group['Tier1']=='RF']
elif check_ok.any():
#and so on
I think this is working to filter each group, but I have no idea how the groups can then relate back to the parent table (df). And I am sure there is a better way to do this.
Thanks!
Upvotes: 2
Views: 336
Reputation: 23099
Let's use pd.Categorical
& drop_duplicates
df['Tier1'] = pd.Categorical(df['Tier1'],['RF','OK','LO'],ordered=True)
df['Tier2'] = pd.Categorical(df['Tier2'],['B','E'],ordered=True)
df1 = df.sort_values(['Tier1','Tier2']).drop_duplicates(subset=['ID'],keep='first')
print(df1)
ID Tier1 Tier2
0 1111 RF B
2 2222 RF B
4 3333 OK B
Looking at Tier1
you can see the ordering.
print(df['Tier1'])
0 RF
1 OK
2 RF
3 RF
4 OK
5 LO
Name: Tier1, dtype: category
Categories (3, object): ['RF' < 'OK' < 'LO']
Upvotes: 3
Reputation: 50498
You can use two groupby
+agg
Pandas calls. Since the ordering RF>OK>LO
and B>E
are respectively compliant the (reverse) lexicographic ordering, you can use the trivial min
/max
functions for the aggregation (otherwise you can write your own custom min-max functions).
Here is how to do that (using a 2-pass filtering):
tmp = df.groupby(['ID', 'Tier2']).agg(max).reset_index() # Step 1
output = tmp.groupby(['ID', 'Tier1']).agg(min).reset_index() # Step 2
Here is the result in output
:
ID Tier1 Tier2
0 1111 RF B
1 2222 RF B
2 3333 OK B
Upvotes: 2