Python Pandas Dataframe - compute difference between rows and take the minimum one

Question

I have a Pandas Dataframe D storing a large database. I also have a smaller DataFrame C with 10 rows, containing exactly the same columns as the main one, including column 'price'. For each row r in the main dataframe D I want to find a row t in C which is closest, i.e., the difference between r.price and t.price is minimal. As an output I want to have:

I have a function computing the difference

def difference(self, row1, row2):
    d = np.abs(row1["price"] - row2["price"])
    return d

And I want to use apply function to apply the difference function to each row in C, for each row v in D:

for _, v in D.iterrows():
    C.apply(self.difference, axis=1, args=(v,))

But I don't know how I should find the row of C for which the difference was minimal. I was thinkking about the min build-in function from Python, but I don't know how to apply it correctly for dataframes.

An example: Let say I have a data D

    id | name | price
1.  bdb | AAA | 2.34
2.  441 | BBB | 3.56
3.  ae9 | CCC | 1.27
4.  fb9 | DDD | 9.78
5.  e6b | EEE | 5.13
6.  bx4 | FFF | 6.23
7.  a9a | GGG | 9.56
8.  847 | HHH | 9.23
9.  e4c | III | 0.45
...
200. eb3 | XYZ | 10.34

And C (for simplicity with just 5 rows) as below

    id | name | price
1.  xyh | AA1 | 0.34
2.  y5h | BB1 | 9.77
3.  af6 | CC1 | 3.24
4.  op9 | DD1 | 6.34
5.  23h | EE1 | 0.20

So, the output of my function should be as follows:

Row bdb in D should be matched with row af6
Row 441 in D should be matched with row af6
Row ae9 in D should be matched with row xyh 
Row fb9 in D should be matched with row y5h
Row e6b in D should be matched with row op9
etc.

Python Pandas Dataframe - compute difference between rows and take the minimum one

Answers (1)

Related Questions