Reputation: 475

Calculate Minimum Distance of all Rows Fulfilling Condition

I am using pandas in python and want to do the following: I want to introduce a new column A in my dataframe. For calculating it I want to consider all rows which have in column B the same value as my "current element" (I think that's a part where im stuck right now) in column B, and then take the minimum value of column C minus the value of the current element of C of all these - and exclude the difference 0, namely a self-reference.

As an example:

   B  C    A
0  0  1.2  1.7  (calculation: possible rows are 1 and 2 (all have B = 0), the differences are 2.9 - 1.2 and 3.0 - 1.2 => min = 1.7
1  0  2.9  -1.7 (min difference is 1.2 - 2.9)
2  0  3.0  -1.8
3  1  4.1  1.4
4  1  5.5  -1.4

Thank you!

Upvotes: 4

Answers (4)

Gemini

Reputation: 475

Thanks all for the helpful answers. My prefered solution now is this:

res = df.sort_values(['B', 'C'])
res.loc[res.B.eq(res.shift(1).B), 'A'] = res.C - res.shift(1).C
df = pd.merge(df, res, on=['B', 'C'])

What do you think? So then I get all smaller / bigger values and could merge those.

Upvotes: 0

Scott Boston

Reputation: 153460

Edit to handle new dataset:

df.groupby('B')['C'].transform(lambda x: np.where(x.idxmin() == x.index,
                                                  x.nsmallest(2).iloc[1]-x,
                                                  (x[x.idxmin()] - x)))

Output:

   B    C    A  A_new
0  0  1.2  1.7    1.7
1  0  2.9 -1.7   -1.7
2  0  3.0 -1.8   -1.8
3  1  4.1  1.4    1.4
4  1  5.5 -1.0   -1.4

IIUC, think you want this, however I am not sure about the 1 in column A. This is the first row in each group. I replace the 0 with 1.

df['A_new'] = df.groupby('B')['C'].transform(lambda x: (x[x.idxmin()] - x).replace(0,1))

Output:

   B  C  A  A_new
0  0  1  1      1
1  0  2 -1     -1
2  0  3 -2     -2
3  1  4  1      1
4  1  5 -1     -1

Timings:

Your solution:

%timeit df.apply(lambda x: df[(df.B == x.B) & (~df.C.eq(x.C))].min().C - x.C, axis=1)

100 loops, best of 3: 9.78 ms per loop

This solution:

%timeit df.groupby('B')['C'].transform(lambda x: np.where(x.idxmin() == x.index,1,(x[x.idxmin()] - x)))

100 loops, best of 3: 3.58 ms per loop

Upvotes: 1

Bharath M Shetty

Reputation: 30605

Transform min and subtract C

df['new'] = (df.groupby('B')['C'].transform('min')-df['C']).replace(0,1)

   B  C  A  new
0  0  1  1    1
1  0  2 -1   -1
2  0  3 -2   -2
3  1  4  1    1
4  1  5 -1   -1

Edit based on updated dataframe :

g = df.groupby('B')
diff = g['C'].transform('min') - df['C']
df['new'] = diff.where(diff!=0,np.nan)
df['new'] = df['new'].fillna(df['new'].abs().groupby(df['B']).transform('min'))

  B    C    A  new
0  0  1.2  1.7  1.7
1  0  2.9 -1.7 -1.7
2  0  3.0 -1.8 -1.8
3  1  4.1  1.4  1.4
4  1  5.5 -1.4 -1.4

Upvotes: 1

BENY

Reputation: 323226

It is hard to understand , but work ...

df['new'] = df.B.map(df.groupby('B').C.apply(list))

df.apply(lambda x :min(list(map(lambda y: y - x['C'],list(set(x['new'])-set([x['C']]))))),axis=1)


Out[1013]: 
0    1
1   -1
2   -2
3    1
4   -1
dtype: int64

More info :

df['NewA']=df.apply(lambda x :min(list(map(lambda y: y - x['C'],list(set(x['new'])-set([x['C']]))))),axis=1)
df
Out[1015]: 
   B  C  A        new  NewA
0  0  1  1  [1, 2, 3]     1
1  0  2 -1  [1, 2, 3]    -1
2  0  3 -2  [1, 2, 3]    -2
3  1  4  1     [4, 5]     1
4  1  5 -1     [4, 5]    -1

Let us using numpy approach

A = df.C.values[:, None] - df.C.values.T
np.fill_diagonal(A, 9999999)
G=df.groupby('B')
np.concatenate([np.min(A[y.min():y.max()+1,y.min():y.max()+1],0) for _, y in G.groups.items()])

Time

%timeit df.apply(lambda x: df[(df.B == x.B) & (~df.C.eq(x.C))].min().C - x.C, axis=1)
100 loops, best of 3: 4.14 ms per loop
%timeit df.groupby('B')['C'].transform(lambda x: np.where(x.idxmin() == x.index,1,(x[x.idxmin()] - x)))
100 loops, best of 3: 1.67 ms per loop

def fff(x):
    A = df.C.values[:, None] - df.C.values.T
    np.fill_diagonal(A, 9999999)
    G=df.groupby('B')
    np.concatenate([np.min(A[y.min():y.max()+1,y.min():y.max()+1],0) for _, y in G.groups.items()])
%timeit fff(1)
1000 loops, best of 3: 758 µs per loop

Upvotes: 1

Calculate Minimum Distance of all Rows Fulfilling Condition

Answers (4)

Edit to handle new dataset:

Related Questions