taway0282
taway0282

Reputation: 197

How to create a matrix of distances

I have a data frame df with some data in 4 columns and a function distance(row_1, row_2) which computes the distance between 2 rows of the data frame. I want to create a matrix of distances between each pair of rows (I don't mind having repeated pairs like 0,1 and 1,0). I though of creating an empty data frame but I am unsure how to do this. Any help would be appreciated!

Data frame looks like this:

    A  |  B  |  C  | D  |
0   12 |  22 | 112 |  9 |
1   14 |  47 |  71 | 18 |
2    5 | 109 |  63 | 20 |
...

Output should look like this:

     0  | 1  |  2  | ...
0     0 | 77 | 154 |
1    77 |  0 |  81 |
2   154 | 81 |   0 |
...

The distance function:

def absolute_difference(a, b):
    return (abs(a - b))

def manhattan_distance(a, b):
    d = 0
    a_list = a.values.tolist()[0]
    b_list = b.values.tolist()[0]
    for i in range(len(a_list)):
        d += absolute_difference(a_list[i], b_list[i])
    return (d)

Upvotes: 0

Views: 57

Answers (1)

supercooler8
supercooler8

Reputation: 503

I feel like there's a more "pythonic" answer involving apply, but here you go:

distance = pd.DataFrame(np.zeros((df.shape[0],df.shape[0])))

for i in range(df.shape[0]):
    for j in range(i,df.shape[0]):
        distance.iloc[i,j] = manhattan_distance(df.iloc[i,:], df.iloc[j,:])
        distance.iloc[j,i] = manhattan_distance(df.iloc[i,:], df.iloc[j,:])

Here we go:

df.apply(lambda x: df.apply(lambda y: manhattan_distance(x,y), axis=1), axis=1)

Upvotes: 1

Related Questions