Reputation: 5314
I am reading data from an Excel file into a pandas DataFrame using read_excel()
. Unfortunately, it seems difficult to ensure the formatting of cells in Excel and so it happens that a table like this:
2018 2019
a b a b
0 1.295666 -0.544973 0.845973 -0.874668
1 0.590123 0.284364 -1.482706 -0.859350
2 0.832228 0.469992 0.994865 0.480301
3 0.098671 0.198643 0.878323 -0.119761
...actually has surprising indices or columns:
df.columns
MultiIndex(levels=[[2018, 2019, '2019'], ['a', 'b']],
labels=[[0, 0, 1, 2], [0, 1, 0, 1]])
As you can see, the primary index of the last column actually has a string for 2019 and not an integer like the others.
To be on the safe side, I would like to convert all indices to str
ing, but pandas wont let me:
df.columns.set_levels(df.columns.levels[0].astype(str), level=0)
ValueError: Level values must be unique: ['2018', '2019', '2019'] on level 0
I see two approaches to solve this:
read_excel()
convert column headers to string orset_levels()
as in my example above to work.But I can't get either to work - any hints?
Upvotes: 0
Views: 184
Reputation: 323326
You can re-create your multiple index
for columns
idx=pd.MultiIndex.from_product([df.columns.levels[0].astype(int).unique(),df.columns.levels[1]])
df.columns=idx
df.columns
MultiIndex(levels=[[2018, 2019], ['a', 'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
From op better layout format
df.columns = pd.MultiIndex.from_product([c.astype(str).unique() for c in df.columns.levels])
Update / Caveat
This solution can lead to some headache. data.columns.codes
(formally know as data.columns.labels
) does not necessarily come in an increasing order from read_excel()
, e.g. FrozenList([[3, 3, 2, 2, 1, 1, 0, 0], [1, 0, 1, 0, 1, 0, 1, 0]])
can occur. When using the .from_product()
approach here, this will cause trouble and change the order of the column names... A workaround is to save its state and write it back after the deed:
old_col_codes = df.columns.codes
df.columns = pd.MultiIndex.from_product([c.astype(str).unique() for c in df.columns.levels])
df.columns.set_codes(old_df_codes, inplace=True)
Upvotes: 1