samba
samba

Reputation: 3101

row_number ranking function to filter the latest records in DF

I want to apply a Window function to a DataFrame to get only the latest metrics for every Id. For the following data I expect the df to contain only the first two records after applying a Window function.

| id | metric | transaction_date |
| 1  | 0.5    |  05-10-2019      | 
| 2  | 15.9   |  07-22-2020      |
| 2  | 4.7    |  11-03-2017      |

Is it a correct approach to use row_number ranking function? My current implementation looks like this:

    df.withColumn(
        "_row_number",
        F.row_number().over(
            Window.partitionBy("id").orderBy(F.desc("transaction_date")))
    )
    .filter(F.col("_row_number") == 1)
    .drop("_row_number")

Upvotes: 2

Views: 347

Answers (2)

maede rayati
maede rayati

Reputation: 786

You need to first sort the dataframe by id and date (descending). Then you do group by id. The first method on group by object will return the first row (which has the latest date).

import pandas as pd
from datetime import datetime

df = pd.DataFrame({'id':[1,2,2], 
'metric':[0.5, 15.9, 4.7], 
'date':[datetime(2019,5,10), datetime(2020,7,22), datetime(2017,11,3)]})


## sort df by id and date
df = df.sort_values(['id','date'], ascending= [True, False])

## return the first row of each group
df.groupby('id').first()




Upvotes: 1

mvasyliv
mvasyliv

Reputation: 1214

  val fDF = Seq( (1, 0.5,  "05-10-2019"),
    (2, 15.9,  "07-22-2020"),
    (2, 4.7,  "11-03-2017"))
    .toDF("id", "metric", "transaction_date")

  val f1DF = fDF
    .withColumn("transaction_date", to_date('transaction_date, "MM-dd-yyyy"))
    .orderBy('id.asc,'transaction_date.desc)

  val f2DF = f1DF.groupBy("id")
    .agg(first('transaction_date).alias("transaction_date"),
      first('metric).alias("metric"))

  f2DF.show(false)
  //      +---+----------------+------+
  //      |id |transaction_date|metric|
  //      +---+----------------+------+
  //      |1  |2019-05-10      |0.5   |
  //      |2  |2020-07-22      |15.9  |
  //      +---+----------------+------+

Upvotes: 0

Related Questions