Mode of a dataframe with priority

Question

I am new to pandas so please bear with me. I want to apply mode on a dataframe but with a tweak. there can be four values [High,Medium,Low,NA] for Columns Item_1, Item_2 and Item_3. High with highest priority and Low being lowest. NA should be ignored.
Data is as shown below.

    date       Type  Item_1   Item_2  Item_3 Price
0   2021-01-01  A    High     Low     Low     20   
1   2021-01-01  A    High     Low     Low     20    
2   2021-01-01  A    Low      High    Low     20   
3   2021-01-01  A    Medium   High    High    20    
4   2021-01-01  B    Low      High    Low     15   
5   2021-01-01  B    Medium   High    High    15
6   2021-01-01  B    Low      Low     Medium  15
6   2021-01-02  A    NA       High    NA      30    
7   2021-01-02  A    NA       High    NA      30    
8   2021-01-02  A    NA       NA      NA      30
9   2021-01-02  A    NA       NA      Low     30
10  2021-01-02  A    NA       NA      Low     30
11  2021-01-02  A    NA       Low     High    30

Expected Output:

    date       Type  Item_1   Item_2  Item_3  Price
0   2021-01-01  A    High     High    Low     20  
1   2021-01-01  B    Low      High    High    15
2   2021-01-02  A    NA       High    Low     30

My use case is, As you can see for 1st row in Item_1 now has value High as its most frequent.
But in Item_2 Both Low and High have same frequency but High have more priority thus value is High in output.
In Item_3 Low is most frequent thus value Low.

Edit:
NA should be ignored. and second most availbe value should be as output as in case of date 2021-01-02 NA has highest frequency but second most frequent value is High.

The value NA does not correspond to NaN rather its a string object. similar to other categorical values. It's just that whenever there is even single High/Low/Medium for a date irrespective of the frequency of NA, I want to choose the categorical value which has second highest occurance/frequency

jezrael · Accepted Answer

I think there is multiple rows with different dates, so first reshape by DataFrame.melt, then convert values to ordered categoricals and last get first mode per groups in GroupBy.agg in lambda function and last reshape by Series.unstack:

cols = df.columns
df = df.melt(['date','Type','Price'])
df["value"] = pd.Categorical(df["value"], 
                             categories=["High", "Medium", "Low"], 
                             ordered=True)
df = (df.groupby(['date','Type','Price','variable'])['value']
        .agg(lambda x: x.mode().iat[0])
        .unstack()
        .rename_axis(None, axis=1)
        .reset_index()
        .reindex(cols, axis=1))
print (df)
         date Type  Item_1 Item_2 Item_3  Price
0  2021-01-01    A    High   High    Low     20
1  2021-01-01    B     Low   High   High     15
2  2021-01-02    A  Medium    Low   High     30

Mode of a dataframe with priority

Answers (2)

Related Questions