looping over grouped dataframe with multiple conditions

Question

I got a csv file that looks like the below table. For each folder, I wish to return the Image with the highest probability of being a 'Dog'. Each folder can only return one image. If Dog is not present, make 'Cat' with the highest probability the primary image. If there's no Cat, make Bird with the highest probability the primary image and so on.

CSV:

FolderName     ImageName    Predictions    Probabilities
   ABC           MyPet           Dog            0.98
   ABC           HisPet          Cat            0.90
   DEF           HerPet          Bird           0.83
   ABC           NotPet          Dog            0.23
   DEF           asdf            Dog            0.78
   DEF           M123            Cat            0.19
   GHI           M123s           Cat            0.89
   GHI           M13             Cat            0.19

I was only able to return the img with the highest probability, How can I Prioritize the Prediction column first then the Probabilities column?

df.loc[df.groupby('FolderName')['Probabilities'].idxmax()]

The code returns

FolderName     ImageName    Predictions    Probabilities
   ABC           MyPet           Dog            0.98
   DEF           asdf            Bird           0.83
   GHI           M123s           Cat            0.89

Desired result:

FolderName     ImageName    Predictions    Probabilities
   ABC           MyPet           Dog            0.98
   DEF           asdf            Dog            0.78
   GHI           M123s           Cat            0.89

cs95 · Accepted Answer

This can be done by converting "Predictions" to an ordered Categorical column, then calling sort_values and drop_duplicates.

df['Predictions'] = pd.Categorical(
    df['Predictions'], categories=['Dog', 'Cat', 'Bird'], ordered=True)

(df.sort_values(['Predictions', 'Probabilities'], ascending=[True, False])
   .drop_duplicates('FolderName'))

  FolderName ImageName Predictions  Probabilities
0        ABC     MyPet         Dog           0.98
4        DEF      asdf         Dog           0.78
6        GHI     M123s         Cat           0.89

looping over grouped dataframe with multiple conditions

Answers (1)

Related Questions