Group DataFrame by column, perform operation on members, and output results in new DataFrame

Question

I have some DataFrame:

df = pd.DataFrame({'type':['Apple', 'Apple', 'Apple'],
                   'subtype':['Fuji', 'Cortland', 'Ambrosia'],
              'score':[1,5,10], 
              'distance':[25,50,75]})

I would like to group this DataFrame by type, calculate the distance ratios for all members (e.g. 25/50, 25/75, 50/75), and return the output of this operation in a new DataFrame, such as:

jezrael · Accepted Answer

I believe you need cross join by type first, then remove rows with same value of subtypes by Series.ne (!=) with boolean indexing and last create ratio column with DataFrame.assign and division with Series.div:

df = df.merge(df, on='type')
df = (df[df['subtype_x'].ne(df['subtype_y'])]
              .assign(ratio=df['distance_x'].div(df['distance_y'])))
print (df)

    type subtype_x  score_x  distance_x subtype_y  score_y  distance_y  \
1  Apple      Fuji        1          25  Cortland        5          50   
2  Apple      Fuji        1          25  Ambrosia       10          75   
3  Apple  Cortland        5          50      Fuji        1          25   
5  Apple  Cortland        5          50  Ambrosia       10          75   
6  Apple  Ambrosia       10          75      Fuji        1          25   
7  Apple  Ambrosia       10          75  Cortland        5          50   

      ratio  
1  0.500000  
2  0.333333  
3  2.000000  
5  0.666667  
6  3.000000  
7  1.500000

Also if need remove distance columns you can assign new column with DataFrame.pop for use and remove distance columns:

df = df.merge(df, on='type', suffixes=('1','2'))
df = df[df['subtype1'].ne(df['subtype2'])].copy()
df['ratio'] = df.pop('distance1').div(df.pop('distance2'))
print (df)
    type  subtype1  score1  subtype2  score2     ratio
1  Apple      Fuji       1  Cortland       5  0.500000
2  Apple      Fuji       1  Ambrosia      10  0.333333
3  Apple  Cortland       5      Fuji       1  2.000000
5  Apple  Cortland       5  Ambrosia      10  0.666667
6  Apple  Ambrosia      10      Fuji       1  3.000000
7  Apple  Ambrosia      10  Cortland       5  1.500000

Group DataFrame by column, perform operation on members, and output results in new DataFrame

Answers (1)

Related Questions