Reputation: 870
I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):
And this is what I want:
I have tried
newmulti = currentDataFrame.set_index(['user_id','account_num'])
But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!
Upvotes: 31
Views: 52619
Reputation: 1
You can do the following to get your desired result:
newmulti = currentDataFrame.set_index(['user_id','account_num'])
newmulti.sort_index(level=[0,1], axis=0, inplace=True)
Upvotes: 0
Reputation: 1037
There are two ways to do it, albeit not exactly like you have shown, but it works.
Say you have the following df:
A B C D
0 nil one 1 NaN
1 bar one 5 5.0
2 foo two 3 8.0
3 bar three 2 1.0
4 foo two 4 2.0
5 bar two 6 NaN
1. Workaround 1:
df.set_index('A', append = True, drop = False).reorder_levels(order = [1,0]).sort_index()
This will return:
2. Workaround 2:
df.set_index(['A', 'B']).sort_index()
Upvotes: 2
Reputation: 919
For clarification of future users I would like to add the following:
As said by Alexander,
df.set_index(['user_id', 'account_num', 'dates'])
with a possible inplace=True
does the job.
The type(df)
gives
pandas.core.frame.DataFrame
whereas type(df.index)
is indeed the expected
pandas.core.indexes.multi.MultiIndex
Upvotes: 9
Reputation: 294198
lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values
midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])
Upvotes: 3
Reputation: 109510
You could simply use groupby
in this case, which will create the multi-index automatically when it sums the sales along the requested columns.
df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()
You should also be able to simply do this:
df.set_index(['user_id', 'account_num', 'dates'])
Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id
, account_num
and date
values but different sales figures) by summing them, which is why I recommended using groupby
.
If you need the multi-index, you can simply access viat new_df.index
where new_df
is the new dataframe created from either of the two operations above.
And user_id
will be level 0 and account_num
will be level 1.
Upvotes: 37
Reputation:
The DataFrame returned by currentDataFrame.set_index(['user_id','account_num'])
has it's index set to ['user_id','account_num']
newmulti.index
will return the MultiIndex object.
Upvotes: 1