martindzejky
martindzejky

Reputation: 396

Why does a column remain in DataFrame's index even after it is dropped

Consider the following piece of code:

>>> data = pandas.DataFrame({ 'user': [1, 5, 3, 10], 'week': [1, 1, 3, 4], 'value1': [5, 4, 3, 2], 'value2': [1, 1, 1, 2] })
>>> data = data.pivot_table(index='user', columns='week', fill_value=0)
>>> data['target'] = [True, True, False, True]
>>> data
     value1       value2       target
week      1  3  4      1  3  4
user
1         5  0  0      1  0  0   True
3         0  3  0      0  1  0   True
5         4  0  0      1  0  0  False
10        0  0  2      0  0  2   True

Now if I call this:

>>> 'target' in data.columns
True

It returns True as expected. However, why does this return True as well?

>>> 'target' in data.drop('target', axis=1).columns
True

How can I drop a column from the table so it's no longer in the index and the above statement returns False?

Upvotes: 5

Views: 428

Answers (2)

Ilya
Ilya

Reputation: 541

I propose @Jeff's comment as a new Answer.

data = data.drop('target', axis=1)
data.columns = data.columns.remove_unused_levels()

Upvotes: 0

Zeugma
Zeugma

Reputation: 32105

As of now (pandas 0.19.2), a multiindex will retain all the ever used labels in its structure. Dropping a column doesn't remove its label from the multiindex and it is still referenced in it. See long GH item here.

Thus, you have to workaround the issue and make assumptions. If you are sure the labels you're checking are on a specific index level (level 0 in your example), then one way is to do this:

'target' in data.drop('target', axis=1).columns.get_level_values(0)
Out[145]: False

If it can be any level, you can use get_values() and lookup on the entire list:

import itertools as it
list(it.chain.from_iterable(data.drop('target', axis=1).columns.get_values()))
Out[150]: ['value1', 1, 'value1', 3, 'value1', 4, 'value2', 1, 'value2', 3, 'value2', 4]

Upvotes: 4

Related Questions