How do I perform inter-row operations within a pandas.dataframe

Question

How do I write the nested for loop to access every other row with respect to a row within a pandas.dataframe?

I am trying to perform some operations between rows in a pandas.dataframe The operation for my example code is calculating Euclidean distances between each row with each other row. The results are then saved into a some list in the form [(row_reference, name, dist)].

I understand how to access each row in a pandas.dataframe using df.itterrows() but I'm not sure how to access every other row with respect to the current row in order to perform the inter-row operation.

import pandas as pd
import numpy
import math

df = pd.DataFrame([{'name': "Bill", 'c1': 3, 'c2': 8}, {'name': "James", 'c1': 4, 'c2': 12},
                   {'name': "John", 'c1': 12, 'c2': 26}])

#Euclidean distance function where x1=c1_row1 ,x2=c1_row2, y1=c2_row1, #y2=c2_row2
def edist(x1, x2, y1, y2):
    dist = math.sqrt(math.pow((x1 - x2),2) + math.pow((y1 - y2),2))
    return dist

# Calculate Euclidean distance for one row (e.g. Bill) against each other row
# (e.g. "James" and "John"). Save results to a list (N_name, dist).

all_results = []

for index, row in df.iterrows():
    results = []
#   secondary loop to look for OTHER rows with respect to the current row
#        results.append(row2['name'],edist())
    all_results.append(row,results)

I hope to perform some operation edist() on all rows with respect to the current row/index.

I expect the loop to do the following:

In[1]:
result = []
result.append(['James',edist(3,4,8,12)])
result.append(['John',edist(3,12,8,26)])
results_all=[]
results_all.append([0,result])
result2 = []
result2.append(['John',edist(4,12,12,26)])
result2.append(['Bill',edist(4,3,12,8)])
results_all.append([1,result2])
result3 = []
result3.append(['Bill',edist(12,3,26,8)])
result3.append(['James', edist(12,4,26,12)])
results_all.append([2,result3])
results_all

With the following expected resulting output:

OUT[1]:
[[0, [['James', 4.123105625617661], ['John', 20.12461179749811]]],
 [1, [['John', 16.1245154965971], ['Bill', 4.123105625617661]]],
 [2, [['Bill', 20.12461179749811], ['James', 16.1245154965971]]]]

Quang Hoang · Accepted Answer

If you data is not too long, you can check out scipy's distance_matrix:

all_results = pd.DataFrame(distance_matrix(df[['c1','c2']],df[['c1','c2']]),
                           index=df['name'],
                           columns=df['name'])

Output:

name        Bill      James       John
name                                  
Bill    0.000000   4.123106  20.124612
James   4.123106   0.000000  16.124515
John   20.124612  16.124515   0.000000

How do I perform inter-row operations within a pandas.dataframe

Answers (2)

Related Questions