Duplicate column Pandas dataframe slice issue

Question

I have a dataframe df with duplicate columns: (I need duplicate columns dataframe, which will be passed as a parameter to matplotlib to plot, so the columns' name and content might be same or different)

>>> df
                                         PE     RT    Ttl_mkv      PE
STK_ID    RPT_Date                                  
11_STK79  20130115  41.932  2.744   3629.155  41.932
21_STK58  20130115  14.223  0.048  30302.324  14.223
22_STK229 20130115  22.436  0.350  15968.313  22.436
23_STK34  20130115 -63.252  0.663   4168.189 -63.252

I can get the second column by : df[df.columns[1]] ,

>>> df[df.columns[1]]
STK_ID     RPT_Date
11_STK79   20130115    2.744
21_STK58   20130115    0.048
22_STK229  20130115    0.350
23_STK34   20130115    0.663

but if I want to get the first column by df[df.columns[0]] , it will give :

>>> df[df.columns[0]]
                                   PE      PE
STK_ID    RPT_Date                
11_STK79  20130115  41.932  41.932
21_STK58  20130115  14.223  14.223
22_STK229 20130115  22.436  22.436
23_STK34  20130115 -63.252 -63.252

Which one has two columns? That will make my application down for the application just wants the first column but Pandas give 1st & 4th column! Is it a bug or it is designed as this on purpose ? How to bypass this issue ?

My pandas version is 0.8.1 .

Rutger Kassies · Accepted Answer

I dont really understand why you need to two columns with the same name, avoiding it would probably be the best.

But to answer your question, this would return only 1 of the 'PE' columns:

df.T.drop_duplicates().T.PE

STK_ID     RPT_Date
11_STK79   20130115    41.932
21_STK58   20130115    14.223
22_STK229  20130115    22.436
23_STK34   20130115   -63.252
Name: PE

or:

df.T.ix[0].T

Duplicate column Pandas dataframe slice issue

Answers (1)

Related Questions