user9413641
user9413641

Reputation:

alternative to df.stack().unstack()

I am reading a bunch of tables from an excel file, all of which have different indices and multiple columns, and combining them on the index column.

What I end up with is a dataframe like the following, where each column has one entry for each index value, and where the indices are duplicated:

import pandas as pd
import numpy as np


nan = np.nan

df = pd.DataFrame([[1, nan, nan, nan],
                   [nan, 2, nan, nan],
                   [nan, nan, 3, nan],
                   [nan, nan, nan, 4],
                   [5, nan, nan, nan],
                   [nan, 6, nan, nan]],
                  index = [1, 1, 1, 1, 2, 2])

print(df)
#      0    1    2    3
# 1  1.0  NaN  NaN  NaN
# 1  NaN  2.0  NaN  NaN
# 1  NaN  NaN  3.0  NaN
# 1  NaN  NaN  NaN  4.0
# 2  5.0  NaN  NaN  NaN
# 2  NaN  6.0  NaN  NaN

..What I then do to get unique indices and get rid of all the nan values is the following:

df = df.stack().unstack()

print(df)
#      0    1    2    3
# 1  1.0  2.0  3.0  4.0
# 2  5.0  6.0  NaN  NaN

..Which seems a bit awkward/inefficient, so my question is:

Is there a better alternative to .stack().unstack() for accomplishing this?

Upvotes: 1

Views: 324

Answers (1)

ThePyGuy
ThePyGuy

Reputation: 18426

Just group the dataframe by index, then call first: the result is same to what you get after consecutive stack and unstack for the sample data you have.

>>> df.groupby(level=0).first()
     0    1    2    3
1  1.0  2.0  3.0  4.0
2  5.0  6.0  NaN  NaN

Upvotes: 1

Related Questions