Reputation: 853
I have a dataset with 50+ columns and would like to drop low-correlated features with respect to a target using a loop, so I don't need to drop them manually.
I've tried:
for feature in df:
if df[feature].corr() < threshold: df.drop(feature, axis=1, inplace=True)
...which obviuosly does not work. I'm quite new to Python.
Advise would be appreciated.
Upvotes: 0
Views: 103
Reputation: 9941
Assuming that the target is in df['y']
:
df = pd.DataFrame({
'a': range(500),
'b': np.random.randint(0, 500, 500),
'c': range(500),
'd': np.random.randint(0, 500, 500),
'y': range(500)})
threshold = 0.5
for feature in [c for c in df.columns if c != 'y']:
if abs(df[feature].corr(df['y'])) < threshold:
del df[feature]
df.head()
Output:
a c y
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
Upvotes: 1