Reputation: 197
I have a data frame df
with some data in 4 columns and a function distance(row_1, row_2)
which computes the distance between 2 rows of the data frame. I want to create a matrix of distances between each pair of rows (I don't mind having repeated pairs like 0,1 and 1,0). I though of creating an empty data frame but I am unsure how to do this. Any help would be appreciated!
Data frame looks like this:
A | B | C | D |
0 12 | 22 | 112 | 9 |
1 14 | 47 | 71 | 18 |
2 5 | 109 | 63 | 20 |
...
Output should look like this:
0 | 1 | 2 | ...
0 0 | 77 | 154 |
1 77 | 0 | 81 |
2 154 | 81 | 0 |
...
The distance function:
def absolute_difference(a, b):
return (abs(a - b))
def manhattan_distance(a, b):
d = 0
a_list = a.values.tolist()[0]
b_list = b.values.tolist()[0]
for i in range(len(a_list)):
d += absolute_difference(a_list[i], b_list[i])
return (d)
Upvotes: 0
Views: 57
Reputation: 503
I feel like there's a more "pythonic" answer involving apply, but here you go:
distance = pd.DataFrame(np.zeros((df.shape[0],df.shape[0])))
for i in range(df.shape[0]):
for j in range(i,df.shape[0]):
distance.iloc[i,j] = manhattan_distance(df.iloc[i,:], df.iloc[j,:])
distance.iloc[j,i] = manhattan_distance(df.iloc[i,:], df.iloc[j,:])
Here we go:
df.apply(lambda x: df.apply(lambda y: manhattan_distance(x,y), axis=1), axis=1)
Upvotes: 1