Pairwise correlations in dataframe

Question

I have a dataframe as following,

print(df)
    SAS_a1  SAS2_a1 SAS3_a1 FDF_b1  FDF2_b1
0   0.673114    0.745755    0.989468    0.498920    0.837440
1   0.811218    0.392196    0.505301    0.615603    0.946847
2   0.252856    0.709125    0.321580    0.826123    0.224813
3   0.566833    0.738661    0.626808    0.815460    0.003738
4   0.102995    0.171741    0.246565    0.784519    0.980965

I aiming to pairwise correlation using pearsonr and but I wanted the pairwise correlation between columns ending with a1 versus b1. The final result should look like,

                     PCC   p-value
SAS_a1__FDF_b1 -0.293373  0.631895
SAS_a1__FDF2_b1 -0.947724  0.014235
SAS2_a1__FDF_b1 0.771389  0.126618
SAS2_a1__FDF2_b1 e  0.132380  0.831942
SAS3_a1__FDF_b1  0.422249  0.478808
SAS3_a1__FDF2_b1  0.346411  0.567923

Any suggestions would be great ..!!! Here is what I tried,

columns = df.columns.tolist()
for col_a, col_b in itertools.combinations(columns, 2):
    correlations[col_a + '__' + col_b] = pearsonr(df.loc[:, col_a], df.loc[:, col_b])
results = DataFrame.from_dict(correlations, orient='index')
results.columns = ['PCC', 'p-value']

P.Tillmann · Accepted Answer

I don't know if its the most elegant solution but you can use list comprehension to create a list containing the relevant columns:

import pandas as pd
from scipy.stats import pearsonr
result = pd.DataFrame()
for a1 in [column for column in df.columns if 'a1' in column]:
  for b1 in [column for column in df.columns if 'b1' in column]:
    result = result.append(
               pd.Series(
                 pearsonr(df[a1],df[b1]),
                 index=['PCC', 'p-value'],
                 name=a1 + '__' +b1
               ))

PS: It would be great if you would include your imports in your next question. (So that people answering don't have to google it)

Pairwise correlations in dataframe

Answers (1)

Related Questions