Reputation:
I am trying to calculate the difference in certain rows based on the values from other columns.
Using the example data frame below, I want to calculate the difference in Time
based on the values in the Code
column. Specifically, I want to loop through and determine the time difference between B
and A
. So Time in B - Time in A.
I can do this manually using the iloc
function but I was hoping to determine a more efficient way. Especially if I have to repeat this process numerous times.
import pandas as pd
import numpy as np
k = 5
N = 15
d = ({'Time' : np.random.randint(k, k + 100 , size=N),
'Code' : ['A','x','B','x','A','x','B','x','A','x','B','x','A','x','B']})
df = pd.DataFrame(data=d)
Output:
Code Time
0 A 89
1 x 39
2 B 24
3 x 62
4 A 83
5 x 57
6 B 69
7 x 10
8 A 87
9 x 62
10 B 86
11 x 11
12 A 54
13 x 44
14 B 71
Expected Output:
diff
1 -65
2 -14
3 -1
4 17
Upvotes: 4
Views: 2938
Reputation: 863331
First filter by boolean indexing
, then subtract by sub
with reset_index
for default index for align Series a
and b
and last if want one column DataFrame
add to_frame
:
a = df.loc[df['Code'] == 'A', 'Time'].reset_index(drop=True)
b = df.loc[df['Code'] == 'B', 'Time'].reset_index(drop=True)
Similar alternative solution:
a = df.loc[df['Code'] == 'A'].reset_index()['Time']
b = df.loc[df['Code'] == 'B'].reset_index()['Time']
c = b.sub(a).to_frame('diff')
print (c)
diff
0 -65
1 -14
2 -1
3 17
Last for new index start from 1
add rename
:
c = b.sub(a).to_frame('diff').rename(lambda x: x + 1)
print (c)
diff
1 -65
2 -14
3 -1
4 17
Another approach if need count more difference is reshape by unstack
:
df = df.set_index(['Code', df.groupby('Code').cumcount() + 1])['Time'].unstack()
print (df)
1 2 3 4 5 6 7
Code
A 89.0 83.0 87.0 54.0 NaN NaN NaN
B 24.0 69.0 86.0 71.0 NaN NaN NaN
x 39.0 62.0 57.0 10.0 62.0 11.0 44.0
#last remove `NaN`s rows
c = df.loc['B'].sub(df.loc['A']).dropna()
print (c)
1 -65.0
2 -14.0
3 -1.0
4 17.0
dtype: float64
#subtract with NaNs values - fill_value=0 return non NaNs values
d = df.loc['x'].sub(df.loc['A'], fill_value=0)
print (d)
1 -50.0
2 -21.0
3 -30.0
4 -44.0
5 62.0
6 11.0
7 44.0
dtype: float64
Upvotes: 2
Reputation: 76366
Assuming your Code
is a repeat of 'A', 'x', 'B', 'x'
, you can just use
>>> (df.Time[df.Code == 'B'].reset_index() - df.Time[df.Code == 'A'].reset_index())[['Time']]
Time
0 -65
1 -14
2 -1
3 17
But note that the original assumption, that 'A'
and 'B'
values alternate, seems fragile.
If you want the indexes to run from 1 to 4, as in your question, you can assign the previous to diff
, and then use
diff.index += 1
>>> diff
Time
1 -65
2 -14
3 -1
4 17
Upvotes: 2