Reputation: 6753
I have a Dataframe containing a name column and and a numerical column.
0 name number
1 a 3
2 a 6
3 b 2
4 b 500
5 c 349
6 c 200
What I want to do is define a new column that shows the numerical difference between two sequential rows that have the same name. This can be outputted onto the first row of the comparison I.e.:
0 name number diff
1 a 3 3
2 a 6 N/A
3 b 2 498
4 b 500 N/A
5 c 349 149
6 c 200 N/A
What is the best way of doing this using Pandas? As a bonus, if there is more than two rows with the same name, I want the highest numbered row compared against the lowest value row only.
Upvotes: 2
Views: 1721
Reputation: 862521
Use groupby
with diff
and abs
:
df['diff'] = df.groupby('name')['number'].diff().abs()
print (df)
0 name number diff
0 1 a 3 NaN
1 2 a 6 3.0
2 3 b 2 NaN
3 4 b 500 498.0
4 5 c 349 NaN
5 6 c 200 149.0
Or:
df['diff'] = df.groupby('name')['number'].diff(-1).abs()
print (df)
0 name number diff
0 1 a 3 3.0
1 2 a 6 NaN
2 3 b 2 498.0
3 4 b 500 NaN
4 5 c 349 149.0
5 6 c 200 NaN
For your another question need transform
:
print (df)
0 name number
0 1 a 3
1 2 a 6
2 2 a 8
3 3 b 2
4 4 b 500
5 4 b 600
6 5 c 349
7 6 c 200
df['diff'] = df.groupby('name')['number'].transform(lambda x: x.max() - x.min())
print (df)
0 name number diff
0 1 a 3 5
1 2 a 6 5
2 2 a 8 5
3 3 b 2 598
4 4 b 500 598
5 4 b 600 598
6 5 c 349 149
7 6 c 200 149
Or:
df = df.groupby('name')['number'].apply(lambda x: x.max() - x.min()).reset_index()
print (df)
name number
0 a 5
1 b 598
2 c 149
Upvotes: 2