How to avoid unnecessary multi-index entries in pandas dataframe concat?

Question

I have the following data:

df1 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Play', 1: [1, 2, 3, 4, 5]}).set_index(['Room', 'User', 'Task'])
df2 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Play', 2: [1, 2, 3, 4, 5]}).set_index(['Room', 'User', 'Task'])
df3 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Clean', 1: [6, 7, 8, 9, 10]}).set_index(['Room', 'User', 'Task'])
df4 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Clean', 2: [6, 7, 8, 9, 10]}).set_index(['Room', 'User', 'Task'])
df = pd.concat([df1, df2, df3, df4]).sort_index()

And the output result looks like:

I wonder why the multi-index has a duplicate entry for each column there is. I expected and want the output format to be like this, where all the multi-index keys only occur once and all NaN values are gone:

This would significantly reduce the size of my dataframe and also later on the storage size on the phy. drive.

How to avoid unnecessary multi-index entries in pandas dataframe concat?

Answers (1)

Related Questions