user2920100
user2920100

Reputation: 89

Subtraction of two series from different parts of the dataframe

I have the following data frame:

        SID     AID          START           END
71       1       1          -11136          -11122   
74       1       1          -11121          -11109   
78       1       1          -11034          -11014   
79       1       2          -11137          -11152   
83       1       2          -11114          -11127   
86       1       2          -11032          -11038
88       1       2          -11121          -11002  

I want to do a subtraction of the START elements with AID==1 and AID==2, in order, such that the expected result would be:

-11136 - (-11137) = 1

-11121 - (-11114) =-7

-11034 - (-11032) =-2

Nan    - (-11002) = NaN

So I extracted two groups:

values1 = group.loc[group['AID'] == 1]["START"]
values2 = group.loc[group['AID'] == 2]["START"]

with the following result:

71   -11136
74   -11121
78   -11034
Name: START, dtype: int64
79   -11137
83   -11114
86   -11032
88   -11002
Name: START, dtype: int64

and did a simple subtraction:

values1-values2

But I got all NaNs:

71   NaN
74   NaN
78   NaN
79   NaN
83   NaN
86   NaN

I noticed that if I use data from the same AID group (e.g. START-END), I get the right answer. I get the NaN only when I "mix" AID group. I'm just getting started with Pandas, but I'm obviously missing something here. Any suggestion?

Upvotes: 1

Views: 1100

Answers (3)

piRSquared
piRSquared

Reputation: 294508

Bizarre way to go about it

-np.diff([g.reset_index(drop=True) for n, g in df.groupby('AID').START])[0]

0    1.0
1   -7.0
2   -2.0
3    NaN
Name: START, dtype: float64

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153510

Let's try this:

df.set_index([df.groupby(['SID','AID']).cumcount(),'AID'])['START'].unstack().add_prefix('col_').eval('col_1 - col_2')

Output:

0    1.0
1   -7.0
2   -2.0
3    NaN
dtype: float64

Upvotes: 2

user2285236
user2285236

Reputation:

pandas does those operations based on labels. Since your labels ((71, 74, 78) and (79, 83, 86)) don't match, it cannot find any value to subtract. One way to deal with this is to use a numpy array instead of a Series so there is no label associated:

values1 - values2.values
Out: 
71    1
74   -7
78   -2
Name: START, dtype: int64

Upvotes: 1

Related Questions