Platalea Minor
Platalea Minor

Reputation: 877

Calculate correlation coefficient by row in pandas

I am trying to calculate the correlation coefficient like how can I calculate correlation between all possible rows

My code import pandas as pd

d = {'Name': ['A', 'B','C'], 'v1': [1,3, 4], 'v2': [3,2, 4], 'v3': [3,9 ,1]}
df = pd.DataFrame(data=d)
result = df.T.corr().unstack().reset_index(name="corr")

but it shows the error IndexError: list index out of range.

Thank you for your assistance

Upvotes: 1

Views: 392

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31156

  1. you need initially to ensure Name is an index so transpose works
  2. after doing corr() you need to rename X axis
  3. finally you need to rename your columns after reset_index()
d = {'Name': ['A', 'B','C'], 'v1': [1,3, 4], 'v2': [3,2, 4], 'v3': [3,9 ,1]}
df = pd.DataFrame(data=d).set_index("Name")
result = df.T.corr()
result.columns.set_names("NameX", inplace=True)
result = result.unstack().to_frame().reset_index().rename(columns={"Name":"NameY",0:"corr"})

output

NameX NameY      corr
    A     A  1.000000
    A     B  0.381246
    A     C -0.500000
    B     A  0.381246
    B     B  1.000000
    B     C -0.991241
    C     A -0.500000
    C     B -0.991241
    C     C  1.000000

Upvotes: 1

Related Questions