Reputation: 59514
I have a dataframe df
with duplicate columns: (I need duplicate columns dataframe, which will be passed as a parameter to matplotlib to plot, so the columns' name and content might be same or different)
>>> df
PE RT Ttl_mkv PE
STK_ID RPT_Date
11_STK79 20130115 41.932 2.744 3629.155 41.932
21_STK58 20130115 14.223 0.048 30302.324 14.223
22_STK229 20130115 22.436 0.350 15968.313 22.436
23_STK34 20130115 -63.252 0.663 4168.189 -63.252
I can get the second column by : df[df.columns[1]]
,
>>> df[df.columns[1]]
STK_ID RPT_Date
11_STK79 20130115 2.744
21_STK58 20130115 0.048
22_STK229 20130115 0.350
23_STK34 20130115 0.663
but if I want to get the first column by df[df.columns[0]]
, it will give :
>>> df[df.columns[0]]
PE PE
STK_ID RPT_Date
11_STK79 20130115 41.932 41.932
21_STK58 20130115 14.223 14.223
22_STK229 20130115 22.436 22.436
23_STK34 20130115 -63.252 -63.252
Which one has two columns? That will make my application down for the application just wants the first column but Pandas give 1st & 4th column! Is it a bug or it is designed as this on purpose ? How to bypass this issue ?
My pandas version is 0.8.1 .
Upvotes: 0
Views: 1166
Reputation: 64443
I dont really understand why you need to two columns with the same name, avoiding it would probably be the best.
But to answer your question, this would return only 1 of the 'PE' columns:
df.T.drop_duplicates().T.PE
STK_ID RPT_Date
11_STK79 20130115 41.932
21_STK58 20130115 14.223
22_STK229 20130115 22.436
23_STK34 20130115 -63.252
Name: PE
or:
df.T.ix[0].T
Upvotes: 2