user7638008
user7638008

Reputation: 147

How to avoid unnecessary multi-index entries in pandas dataframe concat?

I have the following data:

df1 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Play', 1: [1, 2, 3, 4, 5]}).set_index(['Room', 'User', 'Task'])
df2 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Play', 2: [1, 2, 3, 4, 5]}).set_index(['Room', 'User', 'Task'])
df3 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Clean', 1: [6, 7, 8, 9, 10]}).set_index(['Room', 'User', 'Task'])
df4 = pd.DataFrame({'Room': [1, 2, 3, 5, 8], 'User': 'Martin', 'Task': 'Clean', 2: [6, 7, 8, 9, 10]}).set_index(['Room', 'User', 'Task'])
df = pd.concat([df1, df2, df3, df4]).sort_index()

And the output result looks like:

enter image description here

I wonder why the multi-index has a duplicate entry for each column there is. I expected and want the output format to be like this, where all the multi-index keys only occur once and all NaN values are gone: enter image description here

This would significantly reduce the size of my dataframe and also later on the storage size on the phy. drive.

Upvotes: 0

Views: 47

Answers (1)

jezrael
jezrael

Reputation: 862521

If is possible sum values:

df = df.sum(level=[0,1,2])
#alternative
#df = df.groupby(level=[0,1,2]).sum()
print (df)
                      1     2
Room User   Task             
1    Martin Clean   6.0   6.0
            Play    1.0   1.0
2    Martin Clean   7.0   7.0
            Play    2.0   2.0
3    Martin Clean   8.0   8.0
            Play    3.0   3.0
5    Martin Clean   9.0   9.0
            Play    4.0   4.0
8    Martin Clean  10.0  10.0
            Play    5.0   5.0

If possible get only first non missing value:

df = df.groupby(level=[0,1,2], sort=False).first()
print (df)
                      1     2
Room User   Task             
1    Martin Clean   6.0   6.0
            Play    1.0   1.0
2    Martin Clean   7.0   7.0
            Play    2.0   2.0
3    Martin Clean   8.0   8.0
            Play    3.0   3.0
5    Martin Clean   9.0   9.0
            Play    4.0   4.0
8    Martin Clean  10.0  10.0
            Play    5.0   5.0

Upvotes: 1

Related Questions