8-Bit Borges
8-Bit Borges

Reputation: 10033

Apply filter to dataframe based on conditions

I have this df:

df = pd.DataFrame({
  'Team': [
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Y', 'Z'
  ],
  'Ranking': [
    2, 6, 6, 1, 8, 9, 16, 6, 16, 8, 6, 3, 1, 16, 9, 1, 2, 1, 16, 16, 16, 9, 9, 8, 8
  ],
  'Points': [
    1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1
  ]
})

And I need to apply a filter to it using the following logic:

How can I apply the logic above to the dataframe?

Desired Result:

Teams /
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
Y
Z

Upvotes: 0

Views: 55

Answers (2)

Ralubrusto
Ralubrusto

Reputation: 1501

I've tried a solution, but the answer was quite different from the one you've written, probably due to a mistyping. Anyway, here it is a way of doing what you want.

To improve readability, I made a function to return the numbers of items to keep depending on their rank.

def items2keep(ranking):
    return 4 if ranking < 5 else 3 if ranking < 13 else 2 if ranking < 17 else 1

result = []
for rank in df['Ranking'].unique():
    _df = df[df['Ranking'] == rank].nlargest(
        n=items2keep(rank),
        columns=['Points']
    )
    result.append(_df)

final_df = pd.concat(result)

If we print final_df we get:

print(final_df.sort_values(['Ranking', 'Points'], ascending=[True, False]))

   Team  Ranking  Points
15    P        1       3
3     D        1       1
12    M        1       1
17    R        1       1
0     A        2       1
16    Q        2       1
11    L        3       1
1     B        6       1
2     C        6       1
7     H        6       1
9     J        8       2
4     E        8       1
23    Y        8       1
14    O        9       3
5     F        9       2
21    V        9       1
6     G       16       1
8     I       16       1

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

Let's use pd.cut to map the Rankings to numbers of rows to extract, then compare those to the relative row numbers with groupby().cumcount() :

thresh = pd.cut(df['Ranking'], bins=[0,4,12,16,20], 
                labels=[4,3,2,1]).astype(int)

df.loc[df.sort_values(['Points'])
         .groupby('Ranking').cumcount().lt( thresh), 'Team']

Output:

0     A
1     B
2     C
3     D
4     E
5     F
7     H
11    L
12    M
15    P
16    Q
17    R
19    T
20    U
21    V
22    X
23    Y
24    Z
Name: Team, dtype: object

Upvotes: 1

Related Questions