Reputation: 864
I have some DataFrame
:
df = pd.DataFrame({'type':['Apple', 'Apple', 'Apple'],
'subtype':['Fuji', 'Cortland', 'Ambrosia'],
'score':[1,5,10],
'distance':[25,50,75]})
I would like to group this DataFrame by type
, calculate the distance
ratios for all members (e.g. 25/50, 25/75, 50/75), and return the output of this operation in a new DataFrame, such as:
Upvotes: 1
Views: 48
Reputation: 863166
I believe you need cross join by type
first, then remove rows with same value of subtypes
by Series.ne
(!=
) with boolean indexing
and last create ratio column with DataFrame.assign
and division with Series.div
:
df = df.merge(df, on='type')
df = (df[df['subtype_x'].ne(df['subtype_y'])]
.assign(ratio=df['distance_x'].div(df['distance_y'])))
print (df)
type subtype_x score_x distance_x subtype_y score_y distance_y \
1 Apple Fuji 1 25 Cortland 5 50
2 Apple Fuji 1 25 Ambrosia 10 75
3 Apple Cortland 5 50 Fuji 1 25
5 Apple Cortland 5 50 Ambrosia 10 75
6 Apple Ambrosia 10 75 Fuji 1 25
7 Apple Ambrosia 10 75 Cortland 5 50
ratio
1 0.500000
2 0.333333
3 2.000000
5 0.666667
6 3.000000
7 1.500000
Also if need remove distance columns you can assign new column with DataFrame.pop
for use and remove distance columns:
df = df.merge(df, on='type', suffixes=('1','2'))
df = df[df['subtype1'].ne(df['subtype2'])].copy()
df['ratio'] = df.pop('distance1').div(df.pop('distance2'))
print (df)
type subtype1 score1 subtype2 score2 ratio
1 Apple Fuji 1 Cortland 5 0.500000
2 Apple Fuji 1 Ambrosia 10 0.333333
3 Apple Cortland 5 Fuji 1 2.000000
5 Apple Cortland 5 Ambrosia 10 0.666667
6 Apple Ambrosia 10 Fuji 1 3.000000
7 Apple Ambrosia 10 Cortland 5 1.500000
Upvotes: 2