puifais
puifais

Reputation: 870

Create multiindex from existing dataframe

I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

have

And this is what I want:

want

I have tried

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!

Upvotes: 31

Views: 52619

Answers (6)

user26361893
user26361893

Reputation: 1

You can do the following to get your desired result:

newmulti = currentDataFrame.set_index(['user_id','account_num'])
newmulti.sort_index(level=[0,1], axis=0, inplace=True)

Upvotes: 0

user41855
user41855

Reputation: 1037

There are two ways to do it, albeit not exactly like you have shown, but it works.
Say you have the following df:

      A   B    C      D
0   nil one    1    NaN
1   bar one    5    5.0
2   foo two    3    8.0
3   bar three  2    1.0
4   foo two    4    2.0
5   bar two    6    NaN

1. Workaround 1:

df.set_index('A', append = True, drop = False).reorder_levels(order = [1,0]).sort_index()

This will return:

enter image description here

2. Workaround 2:

df.set_index(['A', 'B']).sort_index()

This will return:
enter image description here

Upvotes: 2

Eulenfuchswiesel
Eulenfuchswiesel

Reputation: 919

For clarification of future users I would like to add the following:

As said by Alexander,

df.set_index(['user_id', 'account_num', 'dates'])

with a possible inplace=True does the job.

The type(df) gives

pandas.core.frame.DataFrame

whereas type(df.index) is indeed the expected

pandas.core.indexes.multi.MultiIndex

Upvotes: 9

piRSquared
piRSquared

Reputation: 294198

Use pd.MultiIndex.from_arrays

lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values

midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])

Upvotes: 3

Alexander
Alexander

Reputation: 109510

You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns.

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num and date values but different sales figures) by summing them, which is why I recommended using groupby.

If you need the multi-index, you can simply access viat new_df.index where new_df is the new dataframe created from either of the two operations above.

And user_id will be level 0 and account_num will be level 1.

Upvotes: 37

user2047399
user2047399

Reputation:

The DataFrame returned by currentDataFrame.set_index(['user_id','account_num']) has it's index set to ['user_id','account_num']

newmulti.index will return the MultiIndex object.

Upvotes: 1

Related Questions