Philipp Steinle
Philipp Steinle

Reputation: 27

Pandas calculate minimum for pairs of rows

I have the following dataframe:

data = {'col1': ['A', 'B', 'A', 'B', "A", "B"], 
        'col2': ["0", "2", "0", "1", "0", "0.5"]}  
df = pd.DataFrame.from_dict(data)

df 

   col1 col2
0   A   0
1   B   2
2   A   0
3   B   1
4   A   0
5   B   0.5

There are three pairs of rows (A,B). For each pair I calculate the absolute difference of the numbers in col2. My goal is to get the minimum absolute difference of the three pairs and the corresponding Index. In this case this is 0.5 respectively 4.

I already tried:

(df[df["col1"] == "A"]["col2"] - df[df["col1"] == "B"]["col2"]).abs().min()

But I got a problem with the index.

Does anyone have an idea? Thanks.

Upvotes: 0

Views: 158

Answers (4)

wwii
wwii

Reputation: 23753

To fix the indexing problem you asked about, use .loc with the comparison expression as the row indexer and 'col2' for the column indexer. I've added astype to enable the math later.

>>> x = df.loc[df.col1=='A','col2'].astype(float)
>>> y = df.loc[df.col1=='B','col2'].astype(float)
>>> x
0    0.0
2    0.0
4    0.0
Name: col2, dtype: float64
>>> y
1    2.0
3    1.0
5    0.5
Name: col2, dtype: float64

To subtract the resultant DataFrames reindex the 'B' DataFrame like the 'A' DataFrame to ensure you can recover the original index of the 'A' row.

>>> z = x - y.reindex_like(x,method='bfill')
>>> z
0   -2.0
2   -1.0
4   -0.5
Name: col2, dtype: float64

Extract what you are looking for.

>>> z.abs().agg(['min', 'idxmin'])
min       0.5
idxmin    4.0
Name: col2, dtype: float64
>>>

Unfortunately it is not a one-liner.

Upvotes: 1

PieCot
PieCot

Reputation: 3639

In one line of code:

df.col2.diff().abs().shift(-1)[::2].agg(['idxmin', 'min']).values.tolist()

It returns a list with:

  • the index of the row corresponding to the start of the sequence A-B with the minimum absolute difference;
  • the value of the absolute difference.

Here the output:

[4.0, 0.5]

Upvotes: 1

BENY
BENY

Reputation: 323226

Try with

s = df.iloc[::-1].groupby(df.index//2).col2.diff().abs()
out = s.agg(['min','idxmin'])
Out[193]: 
min       0.5
idxmin    4.0
Name: col2, dtype: float64

Upvotes: 3

Jorge
Jorge

Reputation: 2239

I think you are looking for this:

import numpy as np
df.loc[:,'col2'] = df.col2.astype(np.float)
df[(df.col2 == min(df[df.col2 > 0].col2)) ]

Correct?

Upvotes: 1

Related Questions