Reputation: 11504
I have an issue with the performance of a code I have to calculate the distance between vectors, but I think a little context is in order before exposing the problem.
I have two sets of vectors stored in two dataframes. What I want to do is to compute the distance between the every vector in set of vectors in one dataframe to every vector in the other dataframe. Here are examples of how these dataframes looks like (I post these at the end of the question in the form of dictionaries) here only the first 5 lines:
df_sample =
CalVec
1272 [0.0, 4.0, 8.0, 15.0, 10.0, 8.0, 2.54, 2.0, 4.91, 0.0, 0.0, 0.0, 0.0, 3.59, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.0]
657 [1.44, 12.0, 10.0, 5.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.23, 4.36, 15.0]
806 [4.58, 13.09, 15.46, 3.59, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 6.31]
771 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 5.59, 11.67, 3.91, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
1370 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 15.0, 2.89, 0.0, 0.0, 0.0, 0.0]
df_sample.to_dict()
and
DF =
id \
4538 A4060462000516278
5043 A4050494272716275
11663 A4070271111316245
2701 A4060462848716270
825 A4060454573516274
MeasVec
4538 [0.0, 0.0, 0.0, 0.0, 6.0, 15.0, 16.0, 0.0, 0.0, 5.0, 0.0, 15.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.5, 0.0, 3.0]
5043 [0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 12.0, 0.0, 13.0, 15.0, 0.0, 15.0, 0.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 0.0]
11663 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 15.0, 0.0, 0.0, 0.0, 6.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
2701 [0.0, 0.0, 0.0, 8.0, 13.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 0.0, 7.0]
825 [0.0, 0.0, 0.0, 0.0, 0.0, 11.0, 15.0, 0.0, 13.0, 16.0, 0.0, 9.0, 3.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
df_sample
M = len(DF)
In reality df_sample
has 1700 rows while DF
has 12000 rows. I provide a sample of 10 and 50 respectively.
Now, to compute the distances (in my full size data) I am forced to chunk the larger dataframe into smaller pieces and in my actual distance computation I need to make sure that the chunks have the same amount of rows as df_sample
, hence I create empty vectors for every chunk until it matches the length of df_sample
.
M = len(DF)
N = len(df_sample)
P = int(round(M/N,0))-1
Number_of_id = int(round(M/P,0)) #There are only unique id:s in DF
Number_AP = 26
def zerolistmaker(n):
listofzeros = [0.0] * n
return listofzeros
def split_dataframe(df, chunk_size):
chunks = list()
num_chunks = len(df) // chunk_size + 1
for i in range(num_chunks):
chunks.append(df[i*chunk_size:(i+1)*chunk_size])
return chunks
DF_chunked = split_dataframe(DF,Number_of_id)
and here I compute the distances (actually, weighted distances, so there is no commutativity, i.e. d(v1,v2) != d(v2,v1)
).
import time
t = time.process_time()
DIST = []
for i in range(P):
vec = DF_chunked[i]
number_zero_vectors = len(vec)-len(df_sample)
df =pd.DataFrame(columns = ['CalVec'])
for k in range(number_zero_vectors):
a = zerolistmaker(Number_AP)
df = df.append({'CalVec':a},ignore_index=True)
df_sample_ = pd.concat([df_sample, df])
m = np.repeat(np.vstack(df_sample_['CalVec']), df_sample_.shape[0], axis=0)
n = np.tile(np.vstack(vec['MeasVec']), (vec.shape[0], 1))
d = np.count_nonzero(m, axis=1, keepdims=True)
dist = np.sqrt(np.sum((m - n)**2/d, axis=-1))
mi = pd.MultiIndex.from_product([vec['id']] * 2, names=['id2','id'])
out = pd.DataFrame({'CalVec': m.tolist(),
'MeasVec': n.tolist(),
'distance': dist}, index=mi).reset_index()
DIST.append(out)
elapsed_time = time.process_time() - t
distances = pd.concat(DIST)
distances = distances.drop(['id2'], axis = 1)
distances = distances.dropna()
print(elapsed_time)
which gives the time 0.0625
and the distance
df:
id \
0 A4060462000516278
1 A4050494272716275
2 A4070271111316245
3 A4060462848716270
CalVec \
0 [0.0, 4.0, 8.0, 15.0, 10.0, 8.0, 2.54, 2.0, 4.91, 0.0, 0.0, 0.0, 0.0, 3.59, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.0]
1 [0.0, 4.0, 8.0, 15.0, 10.0, 8.0, 2.54, 2.0, 4.91, 0.0, 0.0, 0.0, 0.0, 3.59, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.0]
2 [0.0, 4.0, 8.0, 15.0, 10.0, 8.0, 2.54, 2.0, 4.91, 0.0, 0.0, 0.0, 0.0, 3.59, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.0]
3 [0.0, 4.0, 8.0, 15.0, 10.0, 8.0, 2.54, 2.0, 4.91, 0.0, 0.0, 0.0, 0.0, 3.59, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 8.0]
MeasVec \
0 [0.0, 0.0, 0.0, 0.0, 6.0, 15.0, 16.0, 0.0, 0.0, 5.0, 0.0, 15.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.5, 0.0, 3.0]
1 [0.0, 0.0, 0.0, 0.0, 0.0, 16.0, 12.0, 0.0, 13.0, 15.0, 0.0, 15.0, 0.0, 0.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 0.0]
2 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 15.0, 0.0, 0.0, 0.0, 6.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
3 [0.0, 0.0, 0.0, 8.0, 13.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 0.0, 7.0]
distance
0 8.98
1 10.45
2 8.92
3 5.19
Now, this seems to be fast but it isn't. In fact the time grows exponentially and when considering the entire sets, it takes almost 20 minutes, if the kernel doesn't crash. It is so memory consuming that I cannot do anything else on my computer.
I would appreciate any insight.
DATA
df_sample = {'CalVec': {1272: [0.0,
4.0,
8.0,
15.0,
10.0,
8.0,
2.54,
2.0,
4.91,
0.0,
0.0,
0.0,
0.0,
3.59,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
8.0],
657: [1.44,
12.0,
10.0,
5.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
8.23,
4.36,
15.0],
806: [4.58,
13.09,
15.46,
3.59,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
0.0,
6.31],
771: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.0,
0.0,
2.0,
0.0,
5.59,
11.67,
3.91,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
1370: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
15.0,
2.89,
0.0,
0.0,
0.0,
0.0],
991: [0.0,
0.0,
0.0,
0.0,
9.0,
1.75,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
6.5,
14.71,
13.0,
9.0],
194: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.0,
15.54,
13.0,
2.12,
0.0],
1128: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.77,
1.8,
7.0,
6.0,
0.0,
1.8,
0.0,
9.0,
7.0,
0.0,
2.5,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
158: [0.0,
0.0,
0.0,
0.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
8.0,
15.44,
13.0,
2.0],
580: [0.0,
2.0,
6.0,
15.64,
2.0,
2.0,
9.23,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.23]}}
and
DF = {'id': {4538: 'A4060462000516278',
5043: 'A4050494272716275',
11663: 'A4070271111316245',
2701: 'A4060462848716270',
825: 'A4060454573516274',
8679: 'A4060462010016274',
11700: 'A4060462080916270',
8594: 'A4060461067716272',
8707: 'A4060454363916275',
1071: 'A4060463723916275',
7128: 'A4050494407616274',
8828: 'A4060464006116272',
8505: 'A4050500855716270',
9958: 'A4060462054116273',
2048: 'A4060461032216279',
8522: 'A4050494268116274',
10934: 'A4070270449716242',
10128: 'A4050500604416279',
9453: 'A4050500735216272',
11820: 'A4060462873316274',
7617: 'A4060461991516276',
6930: 'A4050500905516274',
11376: 'A4060454760216279',
5619: 'A4139300114013544',
35: 'A4050470904716271',
7957: 'A4090281675416244',
4216: 'A4050494309816277',
6244: 'A4050494283216272',
11922: 'A4070271196316248',
8914: 'A4060461041916276',
6054: 'A4060462056416278',
12014: 'A4060464023316273',
1362: 'A4050494275316274',
749: 'A4620451876116275',
4405: 'A4620451903216277',
2021: 'A4060454386016271',
7175: 'A4060462829816270',
351: 'A4060454654316272',
5853: 'A4050494877016279',
7980: 'A4050500932116270',
17: 'A4620451899116270',
8234: 'A4050494361416271',
10271: 'A4050500470516271',
1325: 'A4050500771516275',
2391: 'A4050500683216274',
372: 'A4050494830916277',
5527: 'A4050490253316276',
5431: 'A4050500884316278',
717: 'A4060461998716275',
10015: 'A4050500032916279'},
'MeasVec': {4538: [0.0,
0.0,
0.0,
0.0,
6.0,
15.0,
16.0,
0.0,
0.0,
5.0,
0.0,
15.0,
0.0,
0.0,
0.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.5,
0.0,
3.0],
5043: [0.0,
0.0,
0.0,
0.0,
0.0,
16.0,
12.0,
0.0,
13.0,
15.0,
0.0,
15.0,
0.0,
0.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.0,
3.0,
0.0],
11663: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
15.0,
0.0,
0.0,
0.0,
6.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
2701: [0.0,
0.0,
0.0,
8.0,
13.0,
16.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
6.0,
0.0,
7.0],
825: [0.0,
0.0,
0.0,
0.0,
0.0,
11.0,
15.0,
0.0,
13.0,
16.0,
0.0,
9.0,
3.0,
0.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
8679: [0.0,
4.0,
9.0,
15.0,
10.0,
3.0,
2.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
9.0],
11700: [0.0,
0.0,
6.0,
0.0,
15.0,
8.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
16.0,
0.0,
6.0],
8594: [12.0,
16.0,
16.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
8.0,
0.0,
5.0],
8707: [7.0,
5.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
8.0,
15.0],
1071: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
12.0,
15.5,
6.0,
3.5,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
7128: [0.0,
0.0,
0.0,
0.0,
10.0,
15.0,
16.0,
0.0,
8.0,
12.0,
0.0,
12.0,
0.0,
0.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.0,
0.0,
11.0],
8828: [0.0,
0.0,
0.0,
0.0,
11.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
12.0,
15.0,
15.0,
7.0],
8505: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
15.0,
16.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
9958: [0.0,
0.0,
0.0,
0.0,
14.0,
9.0,
6.0,
0.0,
0.0,
0.0,
0.0,
13.0,
0.0,
0.0,
7.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
7.0,
0.0,
6.0],
2048: [0.0,
0.0,
0.0,
11.0,
0.0,
16.0,
14.0,
0.0,
7.0,
5.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
8522: [0.0,
0.0,
0.0,
4.0,
4.0,
16.0,
9.0,
0.0,
0.0,
3.0,
0.0,
14.0,
0.0,
0.0,
5.5,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
11.5,
0.0,
0.0],
10934: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
8.0,
4.5,
0.0,
2.0,
0.0,
15.0,
5.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
10128: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
7.0,
12.0,
0.0,
12.0,
5.0,
3.0,
6.0,
0.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
9453: [0.0,
0.0,
5.0,
16.0,
0.0,
2.0,
6.0,
0.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
11820: [0.0,
0.0,
0.0,
10.0,
9.0,
15.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
7617: [0.0,
3.0,
10.0,
9.0,
15.0,
11.0,
8.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
2.0,
15.0],
6930: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
0.0,
0.0,
0.0,
10.0,
15.5,
14.0,
15.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
11376: [0.0,
0.0,
10.0,
7.0,
7.0,
11.0,
7.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0,
16.0],
5619: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
12.0,
14.0,
2.5,
2.0,
8.0],
35: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
13.0,
16.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
7957: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.5,
0.0,
7.0,
7.0,
0.0,
2.0,
0.0,
15.0,
8.0,
4.5,
4.5,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
4216: [16.0,
6.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.0,
5.0],
6244: [11.0,
7.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
10.0],
11922: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
15.0,
0.0,
0.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
8914: [2.0,
0.0,
4.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
15.0],
6054: [0.0,
0.0,
0.0,
0.0,
15.0,
9.0,
5.0,
0.0,
0.0,
0.0,
0.0,
13.0,
0.0,
0.0,
6.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
8.0,
0.0,
6.0],
12014: [3.0,
7.0,
6.0,
0.0,
14.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
16.0],
1362: [15.0,
16.0,
5.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
0.0],
749: [14.0,
15.0,
16.0,
3.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
6.0,
2.0,
12.0],
4405: [11.0,
16.0,
16.0,
3.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.0],
2021: [0.0,
0.0,
0.0,
0.0,
0.0,
8.0,
16.0,
0.0,
0.0,
4.0,
0.0,
6.5,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
7175: [0.0,
0.0,
0.0,
2.0,
9.0,
16.0,
15.0,
0.0,
0.0,
3.0,
0.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
0.0,
5.0],
351: [0.0,
0.0,
0.0,
0.0,
12.0,
16.0,
5.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
7.0,
0.0,
5.0],
5853: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
16.0,
8.0,
1.5,
0.0,
0.0,
0.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
7980: [0.0,
0.0,
13.0,
8.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
4.0],
17: [11.0,
16.0,
16.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.0,
11.0],
8234: [0.0,
0.0,
0.0,
0.0,
0.0,
6.0,
7.0,
5.0,
11.0,
13.0,
0.0,
13.0,
3.0,
11.0,
15.0,
12.0,
12.0,
5.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
10271: [0.0,
0.0,
0.0,
0.0,
6.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
9.0,
0.0,
15.0,
9.0,
5.0,
5.0],
1325: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
3.0,
0.0,
0.0,
5.0,
0.0,
16.0,
0.0,
0.0,
9.0,
0.0,
5.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
2391: [0.0,
0.0,
3.0,
16.0,
0.0,
0.0,
0.0,
0.0,
2.0,
0.0,
0.0,
0.0,
0.0,
4.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.0],
372: [0.0,
0.0,
0.0,
0.0,
4.0,
16.0,
10.0,
0.0,
0.0,
3.0,
0.0,
12.0,
0.0,
0.0,
3.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
7.0,
6.0,
0.0],
5527: [0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
5.0,
0.0,
2.0,
0.0,
0.0,
14.0,
16.0,
7.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
5431: [0.0,
0.0,
0.0,
0.0,
2.0,
3.0,
8.0,
0.0,
4.0,
7.0,
0.0,
16.0,
0.0,
0.0,
8.0,
2.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
717: [0.0,
0.0,
0.0,
11.0,
2.0,
14.0,
9.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0],
10015: [0.0,
0.0,
0.0,
7.0,
14.0,
16.0,
15.0,
0.0,
4.0,
9.0,
0.0,
11.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
6.0,
3.0,
12.0]}}
Upvotes: 1
Views: 108
Reputation: 16561
Distance calculation is a common problem, so it can be a good idea to use the available functions for that, specifically sklearn
. The data you provided is not convenient to manage, but the example below might give ideas on how to adapt this workflow to the specifics of your data:
import numpy as np
import pandas as pd
from sklearn.metrics import pairwise_distances
X = pd.DataFrame(np.random.rand(10, 30))
Y = pd.DataFrame(np.random.rand(20, 30))
def custom_distance(x, y):
"""Sample asymmetric function."""
return max(x) + min(y)
# use n_jobs=-1 to run calculations with all cores
result = pairwise_distances(X, Y, metric=custom_distance, n_jobs=-1)
To complete @SultanOrazbayev:
from sklearn.metrics import pairwise_distances
Ax = df_sample['CalVec'] = df_sample['CalVec'].apply(lambda x: np.array(x))
Bx = DF['MeasVec'] = DF['MeasVec'].apply(lambda x: np.array(x))
A = Ax.to_numpy()
B = Bx.to_numpy()
AA = np.stack(A)
BB = np.stack(B)
result = pairwise_distances(AA, BB, metric=custom_distance, n_jobs=-1)
which is performed in under 3 minutes.
Upvotes: 1