O.rka
O.rka

Reputation: 30677

Convert vertical matrix to correlation matrix. Python

I used pd.DataFrame.corr() method to create a correlation matrix from my DataFrame, did some stuff where i cut off certain values to get a table similar to DF_interactions below. I want to now bring this back into a correlation matrix style, such as DF_corr below.

What is the most efficient way using pandas, numpy, sklearn, or scipy to convert a table of interactions to a correlation-style matrix?

I've included my naive method of filling this dataframe...

#Create table of interactions 
DF_interactions=pd.DataFrame([["A","B",0.1],
                              ["A","C",0.4],
                              ["B","C",0.3],
                              ["A","D",0.4]],columns=["var1","var2","corr"])
#   var1 var2  corr
# 0    A    B   0.1
# 1    A    C   0.4
# 2    B    C   0.3
# 3    A    D   0.4
n,m = DF_interactions.shape
#4 3
#Show which labels would be in correlation matrix for rows/columns
nodes = set(DF_interactions["var1"]) | set(DF_interactions["var2"])
#set(['A', 'C', 'B', 'D'])

#Create empty DataFrame to fill
DF_corr = pd.DataFrame(np.zeros((len(nodes),len(nodes))), columns = sorted(nodes),index=sorted(nodes))
#    A  B  C  D
# A  0  0  0  0
# B  0  0  0  0
# C  0  0  0  0
# D  0  0  0  0

#Naive way to fill it
for i in range(n):
    var1 = DF_interactions.iloc[i,0]
    var2 = DF_interactions.iloc[i,1]
    corr = DF_interactions.iloc[i,2]
    DF_corr.loc[var1,var2] = corr
    DF_corr.loc[var2,var1] = corr
#      A    B    C    D
# A  0.0  0.1  0.4  0.4
# B  0.1  0.0  0.3  0.0
# C  0.4  0.3  0.0  0.0
# D  0.4  0.0  0.0  0.0

Upvotes: 1

Views: 962

Answers (1)

Stefan
Stefan

Reputation: 42885

Assuming your table of interactions contains only half the correlations (add .drop_duplicates() if unsure):

corr = pd.concat([DF_interactions, DF_interactions.rename(columns={'var1': 'var2', 'var2': 'var1'})])

Then use .pivot():

corr = corr.pivot(index='var1', columns='var2', values='corr')

var2    A    B    C    D
var1                    
A     NaN  0.1  0.4  0.4
B     0.1  NaN  0.3  NaN
C     0.4  0.3  NaN  NaN
D     0.4  NaN  NaN  NaN

If you prefer 0 values for missing interactions, use .fillna(0).

Upvotes: 1

Related Questions