redacted
redacted

Reputation: 3969

Pandas dropping columns by index drops all columns with same name

Consider following dataframe which has columns with same name (Apparently this does happens, currently I have a dataset like this! :( )

>>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)})
>>> df.rename(columns={"b":"a"},inplace=True)
df

    a   a
0   10  5
1   11  6
2   12  7
3   13  8
4   14  9

>>> df.columns
Index(['a', 'a'], dtype='object')

I would expect that when dropping by index , only the column with the respective index would be gone, but apparently this is not the case.

>>> df.drop(df.columns[-1],1)

0
1
2
3
4

Is there a way to get rid of columns with duplicated column names?

EDIT: I choose missleading values for the first column, fixed now

EDIT2: the expected outcome is

  a
0 10
1 11
2 12 
3 13
4 14

Upvotes: 23

Views: 16452

Answers (1)

EdChum
EdChum

Reputation: 394459

Actually just do this:

In [183]:
df.ix[:,~df.columns.duplicated()]

Out[183]:
   a
0  0
1  1
2  2
3  3
4  4

So this index all rows and then uses the column mask generated from duplicated and invert the mask using ~

The output from duplicated:

In [184]:
df.columns.duplicated()

Out[184]:
array([False,  True], dtype=bool)

UPDATE

As .ix is deprecated (since v0.20.1) you should do any of the following:

df.iloc[:,~df.columns.duplicated()]

or

df.loc[:,~df.columns.duplicated()]

Thanks to @DavideFiocco for alerting me

Upvotes: 27

Related Questions