Reputation: 1108
I begin with a correlation matrix, which means the i,jth entry would be the correlation between the ith element and the jth element(So the diagonals would be 1). I am trying to find the maximum correlation for each element with another element(not including itself since a list of 1's isn't helpful in my case).
1 0.7 0.4 0.1
0.7 1 0.3 0.2
0.4 0.3 1 0.5
0.1 0.2 0.5 1
Suppose I have the above matrix. I would like to have something like
(max correlation, ith element, jth element). In the above matrix, I would like to get
[(0.7, 0, 1), (0.7, 1, 0), (0.5, 2, 3), (0.5, 3, 2)]
as a result.
What would be a good way to go about this?
I have the matrix as a pandas dataframe. The index and columns have the same name, say [0, 1, 2, 3]
for now. Currently I've only thought of doing something like
D = {}
for i in df.columns:
max = (0, 0, 0)
for j in df.columns:
if i==j:
continue
element = df.loc[i,j]
if element > max[0]:
max = (element, i, j)
D[i] = max
Can this be done better/faster, and are there built in methods that can improve this?
Upvotes: 2
Views: 2008
Reputation: 1108
So I ended using bits of the ideas, changing diagonal to some relatively small value like -1, of both answers(from Michael and kraskevich), but with a different method.
maxCors = dfFinalCor.apply(lambda x: (max(x), x.idxmax(), x.name)).tolist()
Gives me what I need :)
In addition, I feel like apply
works well here. (I'm not sure why but I don't like to use zip unless I have to)
Upvotes: 0
Reputation: 3483
Try this:
import numpy as np
c = np.array([[1. , 0.7, 0.4, 0.1],
[0.7, 1. , 0.3, 0.2],
[0.4, 0.3, 1. , 0.5],
[0.1, 0.2, 0.5, 1. ]])
c -= np.eye(c.shape[0]) # remove the 1 on diagonal
result = np.array([[np.max(row), num_row, np.argmax(row)] for num_row, row in enumerate(c)])
From my understanding of what you mean with correlations, I'm assuming that you always have that value 1
on the diagonal of some symmetric real-valued quadratic correlation matrix c
and that you don't care about this diagonal entry, so I'm just cancelling it out. What I do next is iterating over all the rows of the correlation matrix in the list comprehension. For every row I find the maximum and the index of the maximum with np.max
and np.argmax
, respectively, which gives the result you wanted. If you don't want to go with the array, you can instead use result = [(np.max(row), num_row, np.argmax(row)) for num_row, row in enumerate(c)]
(or in light of the solution by @kraskevich result = list(zip(np.max(c, axis=1), np.arange(c.shape[0]), np.argmax(c, axis=1)))
) which yields exactly your expected output.
Upvotes: 1
Reputation: 18546
Firstly, you can fill the diagonal with a value that is smaller than any correlation coefficient. There's a standard numpy
function to do it:
np.fill_diagonal(df.values, -2.)
After that you just need to find the maximum value and its index in each column (a DataFrame
has methods for computing both) and zip the results:
list(zip(df.max(), df.columns, df.idxmax()))
Upvotes: 1