Carl C
Carl C

Reputation: 135

How do I perform inter-row operations within a pandas.dataframe

How do I write the nested for loop to access every other row with respect to a row within a pandas.dataframe?

I am trying to perform some operations between rows in a pandas.dataframe The operation for my example code is calculating Euclidean distances between each row with each other row. The results are then saved into a some list in the form [(row_reference, name, dist)].

I understand how to access each row in a pandas.dataframe using df.itterrows() but I'm not sure how to access every other row with respect to the current row in order to perform the inter-row operation.

import pandas as pd
import numpy
import math

df = pd.DataFrame([{'name': "Bill", 'c1': 3, 'c2': 8}, {'name': "James", 'c1': 4, 'c2': 12},
                   {'name': "John", 'c1': 12, 'c2': 26}])

#Euclidean distance function where x1=c1_row1 ,x2=c1_row2, y1=c2_row1, #y2=c2_row2
def edist(x1, x2, y1, y2):
    dist = math.sqrt(math.pow((x1 - x2),2) + math.pow((y1 - y2),2))
    return dist

# Calculate Euclidean distance for one row (e.g. Bill) against each other row
# (e.g. "James" and "John"). Save results to a list (N_name, dist).

all_results = []

for index, row in df.iterrows():
    results = []
#   secondary loop to look for OTHER rows with respect to the current row
#        results.append(row2['name'],edist())
    all_results.append(row,results)

I hope to perform some operation edist() on all rows with respect to the current row/index.

I expect the loop to do the following:

In[1]:
result = []
result.append(['James',edist(3,4,8,12)])
result.append(['John',edist(3,12,8,26)])
results_all=[]
results_all.append([0,result])
result2 = []
result2.append(['John',edist(4,12,12,26)])
result2.append(['Bill',edist(4,3,12,8)])
results_all.append([1,result2])
result3 = []
result3.append(['Bill',edist(12,3,26,8)])
result3.append(['James', edist(12,4,26,12)])
results_all.append([2,result3])
results_all

With the following expected resulting output:

OUT[1]:
[[0, [['James', 4.123105625617661], ['John', 20.12461179749811]]],
 [1, [['John', 16.1245154965971], ['Bill', 4.123105625617661]]],
 [2, [['Bill', 20.12461179749811], ['James', 16.1245154965971]]]]

Upvotes: 2

Views: 1008

Answers (2)

Parfait
Parfait

Reputation: 107587

Consider shift and avoid any rowwise looping. And because you run straightforward arithmetic, run the expression directly on columns using help of numpy for vectorized calculation.

import numpy as np

df = (df.assign(c1_shift = lambda x: x['c1'].shift(1),
                c2_shift = lambda x: x['c2'].shift(1))
     )

df['dist'] = np.sqrt(np.power(df['c1'] - df['c1_shift'], 2) + 
                     np.power(df['c2'] - df['c2_shift'], 2))

print(df)
#     name  c1  c2  c1_shift  c2_shift       dist
# 0   Bill   3   8       NaN       NaN        NaN
# 1  James   4  12       3.0       8.0   4.123106
# 2   John  12  26       4.0      12.0  16.124515

Should you want every row combination with each other, consider a cross join on itself and query out reverse duplicates:

df = (pd.merge(df.assign(key=1), df.assign(key=1), on="key")
        .query("name_x < name_y")
        .drop(columns=['key'])
     )

df['dist'] = np.sqrt(np.power(df['c1_x'] - df['c1_y'], 2) +
                     np.power(df['c2_x'] - df['c2_y'], 2))

print(df)    
#   name_x  c1_x  c2_x name_y  c1_y  c2_y       dist
# 1   Bill     3     8  James     4    12   4.123106
# 2   Bill     3     8   John    12    26  20.124612
# 5  James     4    12   John    12    26  16.124515

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

If you data is not too long, you can check out scipy's distance_matrix:

all_results = pd.DataFrame(distance_matrix(df[['c1','c2']],df[['c1','c2']]),
                           index=df['name'],
                           columns=df['name'])

Output:

name        Bill      James       John
name                                  
Bill    0.000000   4.123106  20.124612
James   4.123106   0.000000  16.124515
John   20.124612  16.124515   0.000000

Upvotes: 1

Related Questions