Sirrah
Sirrah

Reputation: 1711

Efficiently update values held in scoring matrix

I am continuously calculating correlation matrices where each time the order of the underlying data is randomized. When a correlation score with randomized data is greater than or equal to the original correlation determined with ordered data, I would like to update the corresponding cell in a scoring matrix with +1. (All cells begin as zeroes in the scoring matrix).

Due to the size of the matrices I am dealing with shape = (3681, 12709), I would like to find out an efficient way of doing this. So far, what I have is inefficient and takes too long. I wonder if there is a matrix-operation style approach to this rather than iterating, as I am currently doing below:

for i, j in product(data_sorted.index, data_sorted.columns):

    # if random correlation is as good as or better than sorted correlation
    if data_random.loc[i, j] >= data_sorted.loc[i, j]:

        # update scoring matrix
        scoring_matrix[sorted_index_list.index(i)][sorted_column_list.index(j)] += 1

I have crudely timed this approach and found that doing this for a single line of my matrix will take roughly 4.2 seconds which seems excessive.

Any help would he much obliged.

Upvotes: 0

Views: 56

Answers (1)

chrisb
chrisb

Reputation: 52276

Assuming everything has the same indices, this should work as expected and be pretty quick.

scoring_matrix += (data_random >= data_sorted).astype(int)

Upvotes: 2

Related Questions