Reputation:
I am reading a bunch of tables from an excel file, all of which have different indices and multiple columns, and combining them on the index column.
What I end up with is a dataframe like the following, where each column has one entry for each index value, and where the indices are duplicated:
import pandas as pd
import numpy as np
nan = np.nan
df = pd.DataFrame([[1, nan, nan, nan],
[nan, 2, nan, nan],
[nan, nan, 3, nan],
[nan, nan, nan, 4],
[5, nan, nan, nan],
[nan, 6, nan, nan]],
index = [1, 1, 1, 1, 2, 2])
print(df)
# 0 1 2 3
# 1 1.0 NaN NaN NaN
# 1 NaN 2.0 NaN NaN
# 1 NaN NaN 3.0 NaN
# 1 NaN NaN NaN 4.0
# 2 5.0 NaN NaN NaN
# 2 NaN 6.0 NaN NaN
..What I then do to get unique indices and get rid of all the nan values is the following:
df = df.stack().unstack()
print(df)
# 0 1 2 3
# 1 1.0 2.0 3.0 4.0
# 2 5.0 6.0 NaN NaN
..Which seems a bit awkward/inefficient, so my question is:
Is there a better alternative to .stack().unstack() for accomplishing this?
Upvotes: 1
Views: 324
Reputation: 18426
Just group the dataframe by index, then call first
: the result is same to what you get after consecutive stack
and unstack
for the sample data you have.
>>> df.groupby(level=0).first()
0 1 2 3
1 1.0 2.0 3.0 4.0
2 5.0 6.0 NaN NaN
Upvotes: 1