Reputation: 1711
I am continuously calculating correlation matrices where each time the order of the underlying data is randomized. When a correlation score with randomized data is greater than or equal to the original correlation determined with ordered data, I would like to update the corresponding cell in a scoring matrix with +1. (All cells begin as zeroes in the scoring matrix).
Due to the size of the matrices I am dealing with shape = (3681, 12709)
, I would like to find out an efficient way of doing this. So far, what I have is inefficient and takes too long. I wonder if there is a matrix-operation style approach to this rather than iterating, as I am currently doing below:
for i, j in product(data_sorted.index, data_sorted.columns):
# if random correlation is as good as or better than sorted correlation
if data_random.loc[i, j] >= data_sorted.loc[i, j]:
# update scoring matrix
scoring_matrix[sorted_index_list.index(i)][sorted_column_list.index(j)] += 1
I have crudely timed this approach and found that doing this for a single line of my matrix will take roughly 4.2 seconds which seems excessive.
Any help would he much obliged.
Upvotes: 0
Views: 56
Reputation: 52276
Assuming everything has the same indices, this should work as expected and be pretty quick.
scoring_matrix += (data_random >= data_sorted).astype(int)
Upvotes: 2