Alex Xu
Alex Xu

Reputation: 105

How to select columns that are highly correlated with one specific column in a dataframe

I have a dataframe which has over 100 columns, with which I am trying to build a model. In this case, one column (A) in this dataframe is considered as a response and all the other columns (B,C,D, etc.) are predictors. So I am trying to select all the columns that are correlated to column A based on correlation factor (say >0.2). I already generated a heatmap with all the correlation factors between each pair of the columns. But can I have a quick method in pandas to get all the columns with a collrelation factor over 0.2 (which I will adjust of course if needed) to column A? Thanks in advance!

Upvotes: 3

Views: 2551

Answers (1)

ALollz
ALollz

Reputation: 59519

Use the DataFrame to calculate the correlation, then slice the columns by your cut-off condition with a Boolean mask.

import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9,10],
                   'B': [1,2,4,3,5,7,6,8,10,11], 
                   'C': [15,-1,17,-10,-10,-13,-99,-101,0,0],
                   'D': [0,10,0,0,-10,0,0,-10,0,10]} )

df.loc[:, df.corr()['A'] > 0.2]

    A   B
0   1   1
1   2   2
2   3   4
3   4   3
4   5   5
5   6   7
6   7   6
7   8   8
8   9   10
9   10  11

Upvotes: 5

Related Questions