Reputation: 89
I have the following data frame:
SID AID START END
71 1 1 -11136 -11122
74 1 1 -11121 -11109
78 1 1 -11034 -11014
79 1 2 -11137 -11152
83 1 2 -11114 -11127
86 1 2 -11032 -11038
88 1 2 -11121 -11002
I want to do a subtraction of the START elements with AID==1 and AID==2, in order, such that the expected result would be:
-11136 - (-11137) = 1
-11121 - (-11114) =-7
-11034 - (-11032) =-2
Nan - (-11002) = NaN
So I extracted two groups:
values1 = group.loc[group['AID'] == 1]["START"]
values2 = group.loc[group['AID'] == 2]["START"]
with the following result:
71 -11136
74 -11121
78 -11034
Name: START, dtype: int64
79 -11137
83 -11114
86 -11032
88 -11002
Name: START, dtype: int64
and did a simple subtraction:
values1-values2
But I got all NaNs:
71 NaN
74 NaN
78 NaN
79 NaN
83 NaN
86 NaN
I noticed that if I use data from the same AID group (e.g. START-END), I get the right answer. I get the NaN only when I "mix" AID group. I'm just getting started with Pandas, but I'm obviously missing something here. Any suggestion?
Upvotes: 1
Views: 1100
Reputation: 294508
Bizarre way to go about it
-np.diff([g.reset_index(drop=True) for n, g in df.groupby('AID').START])[0]
0 1.0
1 -7.0
2 -2.0
3 NaN
Name: START, dtype: float64
Upvotes: 0
Reputation: 153510
Let's try this:
df.set_index([df.groupby(['SID','AID']).cumcount(),'AID'])['START'].unstack().add_prefix('col_').eval('col_1 - col_2')
Output:
0 1.0
1 -7.0
2 -2.0
3 NaN
dtype: float64
Upvotes: 2
Reputation:
pandas does those operations based on labels. Since your labels ((71, 74, 78) and (79, 83, 86)) don't match, it cannot find any value to subtract. One way to deal with this is to use a numpy array instead of a Series so there is no label associated:
values1 - values2.values
Out:
71 1
74 -7
78 -2
Name: START, dtype: int64
Upvotes: 1