Philipp O.
Philipp O.

Reputation: 41

List of pairwise distances from full pairwise distance matrix

All the answers I have found already deal with the opposite problem of generating a pairwise distance matrix from a list of pairwise distances. Assuming I have a full pairwise distance matrix in the format:


    1       2       3       4       5
1   0.000   1.154   1.235   1.297   0.960   
2   1.154   0.000   0.932   0.929   0.988
3   1.235   0.932   0.000   0.727   1.244
4   1.297   0.929   0.727   0.000   1.019
5   0.960   0.988   1.244   1.019   0.000

And in the end I need a pairwise distance list like:

1   2   1.154
1   3   1.235
1   4   1.297
1   5   0.960
2   3   0.932
... ... ... 

How would I proceed? I haven't found a function yet to "slice" the dataframe above the diagonal of 0s. Also it would be great if you could hint at me how well this scales with matrix size? Because the original distance matrix is about 100,000x100,000 items big.

Many, many thanks in advance!

I figured that I can use pandas dataframe.stack() function to get the list. But in that case I get double values for all distances as the matrix is symmetrical and I get also the distances of 0 between equal items.

Upvotes: 1

Views: 320

Answers (2)

mozway
mozway

Reputation: 262284

Use 's triu_indices_from:

a = df.to_numpy()
idx, col = np.triu_indices_from(a, k=1)

out = pd.DataFrame({'index': df.index[idx],
                    'columns': df.columns[col],
                    'value': a[idx, col]})

Output:

   index columns  value
0      1       2  1.154
1      1       3  1.235
2      1       4  1.297
3      1       5  0.960
4      2       3  0.932
5      2       4  0.929
6      2       5  0.988
7      3       4  0.727
8      3       5  1.244
9      4       5  1.019

Upvotes: 3

Corralien
Corralien

Reputation: 120539

Use numpy:

>>> df.values[np.triu_indices(df.shape[0], k=1)]
array([1.154, 1.235, 1.297, 0.96 , 0.932, 0.929, 0.988, 0.727, 1.244,
       1.019])

Upvotes: 0

Related Questions