Reputation: 89
I have a pandas dataframe like this, where place1
and place2
are the indexes, I want to sort index place2
by data
value, without disrupt the order of index place1
, that is, the place2
column becomes a, c, b
for each place1
. Is there a way to do this?
place1 place2 data
sss a 100
b 300
c 200
ses a 400
b 600
c 500
... ... ...
I also want to calculate the data
difference between place2 == b
and place2 == a
for each place1
, that is, the dataframe becomes
place1 place2 data diff
sss a 100 200
b 300
c 200
ses a 400 200
b 600
c 500
... ... ... ...
Is there an efficient way to calculate the difference like this? Thanks in advance!
Upvotes: 0
Views: 293
Reputation:
You can .groupby()
and then calculate the diff of each group:
groups = df.groupby(level=0, group_keys=False, sort=False)
df['diff'] = groups.transform(lambda x: x.xs('b', level='place2').iloc[0] - x.xs('a', level='place2').iloc[0])
It adds the value to each row of the group - but you can remove the "duplicates" afterwards if needed.
>>> df
data diff
place1 place2
sss a 100 200
b 300 200
c 200 200
ses a 400 200
b 600 200
c 500 200
You can then sort the data
column within each group
>>> groups.apply(lambda x: x.sort_values('data'))
data diff
place1 place2
sss a 100 200
c 200 200
b 300 200
ses a 400 200
c 500 200
b 600 200
Upvotes: 1