Raj
Raj

Reputation: 1665

In Pandas, a duplicate column is appearing with ".1" appended to name

With the following code for a .xlsx file with a Multiindex in the columns:

import pandas as pd

df = pd.read_excel('Total Data.xlsx', header=[0,1], sheet_name="Control (E)")

c = df.xs('Time', drop_level=False, axis=1, level=1).columns

print(c.levels[1])

I get the following output:

c.levels[1]
Index([' SV (Nexn)', '%CO2', '%O2', 'CO (Nexn)', 'DBP (Calc)',
       'EtCO2 (Nonin)', 'HR (Calc)', 'HR (Nexn)', 'MAP (Calc)',
       'MAP (Nexn)', 'MV (calc)', 'MV (calc).1', 'RR (Nonin)', 'SBP (Calc)',
       'Subject', 'TV (Calc)', 'Time', 'VO2 (calc)', 'Vi'],
      dtype='object')

It appears to have something to do with the empty column I have separating the columns at level 0 (namely, Baseline and Stage 1), which subsequently also separates the level 1 columns.

This is a screenshot of the Excel file's multiindex header: enter image description here

Why is MV (calc).1 showing up as a column when it's not actually in the Excel file, and how can I avoid it from occurring?

Upvotes: 0

Views: 750

Answers (1)

CompanyTime
CompanyTime

Reputation: 39

You most likely have the same column name twice in your excel sheet.

All keys must be unique, which is why Pandas is appending a ".1" to the end of the second occurrence of a duplicate column name.

Upvotes: 3

Related Questions