Reputation: 1604
I have the following dataset (non-unique id) :
id data country
1 8 B
2 15 A
3 14 D
3 19 D
3 8 C
3 20 A
For rows with country ANYTHING BUT "A", I want to add a "rank" column.
For rows with country "A", I want to leave "rank" value empty (or 0).
Expected output :
id data country rank
1 8 B 1
2 15 A 0
3 14 D 3
3 19 D 4
3 8 C 2
3 20 A 0
This post Pandas rank by column value gives great insight.
I can try :
df['rank'] = df['data'].rank(ascending=True)
but I don't know how to take "country" into account ?
Upvotes: 1
Views: 7024
Reputation: 785
EDIT: Written before an edit to the question so doesn't do exactly what the OP wants.
df['rank_A'] = df.data[df['country']=='A'].rank(ascending=True)
Tested on this
import pandas as pd
from pandas import DataFrame
import numpy as np
df2 = DataFrame(np.random.randn(5, 2))
df2.columns = ['A','B']
df2['rank'] = df2.A[df2['B']>0].rank(ascending=True)
df2
which gives the ranking according to A for rows in which B is greater than zero.
Upvotes: 5