Reputation: 105
I have a dataframe which has over 100 columns, with which I am trying to build a model. In this case, one column (A) in this dataframe is considered as a response and all the other columns (B,C,D, etc.) are predictors. So I am trying to select all the columns that are correlated to column A based on correlation factor (say >0.2). I already generated a heatmap with all the correlation factors between each pair of the columns. But can I have a quick method in pandas to get all the columns with a collrelation factor over 0.2 (which I will adjust of course if needed) to column A? Thanks in advance!
Upvotes: 3
Views: 2551
Reputation: 59519
Use the DataFrame
to calculate the correlation, then slice the columns by your cut-off condition with a Boolean mask.
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5,6,7,8,9,10],
'B': [1,2,4,3,5,7,6,8,10,11],
'C': [15,-1,17,-10,-10,-13,-99,-101,0,0],
'D': [0,10,0,0,-10,0,0,-10,0,10]} )
df.loc[:, df.corr()['A'] > 0.2]
A B
0 1 1
1 2 2
2 3 4
3 4 3
4 5 5
5 6 7
6 7 6
7 8 8
8 9 10
9 10 11
Upvotes: 5