n_user184
n_user184

Reputation: 23

Get Maximum Intersection as An Aggregate Function in Python

I have a dataframe like below (available in array format or unnest one):

team  | player     | favorite_food
  A   | A_player1  | [pizza, sushi]
  A   | A_player2  | [salad, sushi]
  B   | B_player1  | [pizza, pasta, salad, taco]
  B   | B_player2  | [taco, salad, sushi]
  B   | B_player3  | [taco]

I want to get number and percentage of food players have in common per team. Something like below:

team  | #_food_common | percent_food_common
  A   | 1             |  0.33
  B   | 1             |  0.2

What is a good way to do this in Python preferably Pandas?

Upvotes: 1

Views: 43

Answers (1)

mozway
mozway

Reputation: 262224

You can use set operations and groupby.agg:

(df['favorite_food'].apply(set)
 .groupby(df['team'])
 .agg(**{'#_food_common': lambda x: len(set.intersection(*x)),
         'percent_food_common': lambda x: len(set.intersection(*x))/len(set.union(*x)),
         
        })
 .reset_index()
)

Output:

  team  #_food_common  percent_food_common
0    A              1             0.333333
1    B              1             0.200000

Used input:

df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B'],
                   'player': ['A_player1', 'A_player2', 'B_player1', 'B_player2', 'B_player3'],
                   'favorite_food': [['pizza', 'sushi'],
                                     ['salad', 'sushi'],
                                     ['pizza', 'pasta', 'salad', 'taco'],
                                     ['taco', 'salad', 'sushi'],
                                     ['taco']]})

Upvotes: 3

Related Questions