kstats9pt3
kstats9pt3

Reputation: 873

How do I create a decile column in Python polars?

Let's say I have a column of FICO scores. I'd like to create another column FICO_DECILE that ranks the FICO scores descending and assigns a decile group, i.e. FICO=850 would have FICO_DECILE=1, and something like FICO=360 would have FICO_DECILE=10.

I tried:

# decile rank
df1 = df.with_columns(
    (
      (pl.col('fico').rank(method='dense')/df.height*10).cast(pl.UInt32).alias('fico_decile')
    )
)

But I only get DECILE_GROUP equal to 0 and null.

Upvotes: 1

Views: 119

Answers (1)

D.lola
D.lola

Reputation: 2274

I have not tested but this should work, overall, I think you are on the right track, The main reason why you are getting that error is basically how you are using rank().

The trick is to use descending: bool = False

I have modified your code to include this.

df1 = df.with_columns(
    (
        (pl.col('fico').rank(method='min', descending=True) / df.height * 10)
        .ceil()
        .cast(pl.UInt32)
        .alias('fico_decile')
    )
)

Since you are getting null values we could do a bit of data wrangling here

see ref: lit

# filter out null FICO scores
fico_df = df.filter(pl.col('fico').is_not_null())

# rank for non-null FICO scores
fico_df = fico_df.with_columns(
    (
        (pl.col('fico')
         .rank(method='min', descending=True)  
         / fico_df.height * 10)  
        .ceil()
        .cast(pl.UInt32) 
        .alias('fico_decile')
    )
)

# handle null with lit 
null_fico_df = df.filter(pl.col('fico').is_null()).with_columns(
    pl.lit(None).alias('fico_decile')
)


# df with valid and invalid
df_final = pl.concat([fico_df, null_fico_df])

# results
df_final.select(['fico', 'fico_decile']).show()

Upvotes: 2

Related Questions