AsheKetchum
AsheKetchum

Reputation: 1108

Python Getting second largest element in each row of matrix

I begin with a correlation matrix, which means the i,jth entry would be the correlation between the ith element and the jth element(So the diagonals would be 1). I am trying to find the maximum correlation for each element with another element(not including itself since a list of 1's isn't helpful in my case).

1    0.7  0.4  0.1
0.7  1    0.3  0.2
0.4  0.3  1    0.5
0.1  0.2  0.5  1

Suppose I have the above matrix. I would like to have something like
(max correlation, ith element, jth element). In the above matrix, I would like to get
[(0.7, 0, 1), (0.7, 1, 0), (0.5, 2, 3), (0.5, 3, 2)]
as a result.

What would be a good way to go about this?
I have the matrix as a pandas dataframe. The index and columns have the same name, say [0, 1, 2, 3] for now. Currently I've only thought of doing something like

D = {}
for i in df.columns:
    max = (0, 0, 0)
    for j in df.columns:
        if i==j:
           continue
        element = df.loc[i,j]
        if element > max[0]:
           max = (element, i, j)
    D[i] = max

Can this be done better/faster, and are there built in methods that can improve this?

Upvotes: 2

Views: 2008

Answers (3)

AsheKetchum
AsheKetchum

Reputation: 1108

So I ended using bits of the ideas, changing diagonal to some relatively small value like -1, of both answers(from Michael and kraskevich), but with a different method.

maxCors = dfFinalCor.apply(lambda x: (max(x), x.idxmax(), x.name)).tolist()

Gives me what I need :)
In addition, I feel like apply works well here. (I'm not sure why but I don't like to use zip unless I have to)

Upvotes: 0

Michael H.
Michael H.

Reputation: 3483

Try this:

import numpy as np

c = np.array([[1. ,  0.7,  0.4,  0.1],
              [0.7,  1. ,  0.3,  0.2],
              [0.4,  0.3,  1. ,  0.5],
              [0.1,  0.2,  0.5,  1. ]])
c -= np.eye(c.shape[0])  # remove the 1 on diagonal
result = np.array([[np.max(row), num_row, np.argmax(row)] for num_row, row in enumerate(c)])

From my understanding of what you mean with correlations, I'm assuming that you always have that value 1 on the diagonal of some symmetric real-valued quadratic correlation matrix c and that you don't care about this diagonal entry, so I'm just cancelling it out. What I do next is iterating over all the rows of the correlation matrix in the list comprehension. For every row I find the maximum and the index of the maximum with np.max and np.argmax, respectively, which gives the result you wanted. If you don't want to go with the array, you can instead use result = [(np.max(row), num_row, np.argmax(row)) for num_row, row in enumerate(c)] (or in light of the solution by @kraskevich result = list(zip(np.max(c, axis=1), np.arange(c.shape[0]), np.argmax(c, axis=1)))) which yields exactly your expected output.

Upvotes: 1

kraskevich
kraskevich

Reputation: 18546

Firstly, you can fill the diagonal with a value that is smaller than any correlation coefficient. There's a standard numpy function to do it:

np.fill_diagonal(df.values, -2.)

After that you just need to find the maximum value and its index in each column (a DataFrame has methods for computing both) and zip the results:

list(zip(df.max(), df.columns, df.idxmax()))

Upvotes: 1

Related Questions