behappy
behappy

Reputation: 35

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

I am trying to get the columns from dataframe whose correlation with another column is greater than certain values like below.

df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]

But I am getting below error,

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Also if I try to select the columns with variance > 1, I get the same error,

df.loc[;df.var() > 1 ].

Why I am getting indexing error. I want to filter the columns of dataframe if correlation of that column with another columns is between -0.05 and 0.05.

Can someone assist in resolving this issue. I am not sure where I am going wrong

Upvotes: 2

Views: 5893

Answers (1)

Falx
Falx

Reputation: 211

I think I found what's your problem.

First I tried to build my own testing set, unfortunately everything worked nicely:

df = pd.DataFrame({
    "col": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
    "A": [1.1, 1.0, 1.0, 1.0, 1.0, 1.1],
    "B": [1.0, 2.1, 3.0, 3.9, 5.0, 6.0]
})
df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]

I got :

   col    B
0  1.0  1.0
1  2.0  2.1
2  3.0  3.0
3  4.0  3.9
4  5.0  5.0
5  6.0  6.0

But then, after reading again your error, I thought maybe there are some columns in your data the corr() method is just ignoring such as column with an object dtype.

If I build a new testing set with textual columns, I get the same error as you:

df = pd.DataFrame({
    "col": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
    "A": [1.1, 1.0, 1.0, 1.0, 1.0, 1.1],
    "B": [1.0, 2.1, 3.0, 3.9, 5.0, 6.0],
    "C": ["A", "B", "C", "D", "E", "F"]
})
df.corr()['col'] >= 0.05
df.loc[:, (df.corr()['col'] <= -0.05) | (df.corr()['col'] >= 0.05)]

Then I got:

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

One way of fixing this is by doing so:

df = df.drop(columns=df.corr().query("-0.05 < col < 0.05").index)

Note: Please remind you'll have quicker and more relevant answers if you provide a full sample of the non-working code so that your error can be reproduced easily ;)

Upvotes: 3

Related Questions