user88484
user88484

Reputation: 1557

Pandas groupby keep rows according to ranking

I have this dataframe:

    date        value       source
0   2020-02-14  0.438767    L8-SR
1   2020-02-15  0.422867    S2A-SR
2   2020-03-01  0.657453    L8-SR
3   2020-03-01  0.603989    S2B-SR
4   2020-03-11  0.717264    S2B-SR
5   2020-04-02  0.737118    L8-SR

I would like to groupby by the date columns where I keep the rows according to a ranking/importance of my chooseing from the source columns. For example, my ranking is L8-SR>S2B-SR>GP6_r, meaning that for all rows with the same date, keep the row where source==L8-SR, if none contain L8-SR, then keep the row where source==S2B-SR etc. How can I accomplish that in pandas groupby

Output should look like this:

    date        value       source
0   2020-02-14  0.438767    L8-SR
1   2020-02-15  0.422867    S2A-SR
2   2020-03-01  0.657453    L8-SR
3   2020-03-11  0.717264    S2B-SR
4   2020-04-02  0.737118    L8-SR

Upvotes: 1

Views: 65

Answers (2)

Ashish
Ashish

Reputation: 1

TRY below code for the group by operation. For ordering after this operation you can perform sortby:

# Import pandas library
import pandas as pd

# Declare a data dictionary contains the data mention in table
pandasdata_dict = {'date':['2020-02-14', '2020-02-15', '2020-03-01', '2020-03-01', '2020-03-11', '2020-04-02'],  
        'value':[0.438767, 0.422867, 0.657453, 0.603989, 0.717264, 0.737118],  
        'source':['L8-SR', 'S2A-SR', 'L8-SR', 'S2B-SR', 'S2B-SR', 'L8-SR']}  

# Convert above dictionary data to the data frame
df = pd.DataFrame(pandasdata_dict)

# display data frame
df

# Convert date field to datetime 
df["date"] = pd.to_datetime(df["date"])

# Once conversion done then do the group by operation on the data frame with date field
df.groupby([df['date'].dt.date])

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150805

Let's try category dtype and drop_duplicates:

orders = ['L8-SR','S2B-SR','GP6_r']

df.source = df.source.astype('category')

df.source.cat.set_categories(orders, ordered=True)

df.sort_values(['date','source']).drop_duplicates(['date'])

Output:

         date     value  source
0  2020-02-14  0.438767   L8-SR
1  2020-02-15  0.422867  S2A-SR
2  2020-03-01  0.657453   L8-SR
4  2020-03-11  0.717264  S2B-SR
5  2020-04-02  0.737118   L8-SR

Upvotes: 1

Related Questions