Reputation: 1773
I have a DataFrame that looks like
Emp1 Empl2 date Company
0 0 0 2012-05-01 apple
1 0 1 2012-05-29 apple
2 0 1 2013-05-02 apple
3 0 1 2013-11-22 apple
18 1 0 2011-09-09 google
19 1 0 2012-02-02 google
20 1 0 2012-11-26 google
21 1 0 2013-05-11 google
I want to pass the company and date for setting a MultiIndex
for this DataFrame. Currently it has a default index. I am using
df.set_index(['Company', 'date'], inplace=True)
But when I print, it prints None
. Is this not the correct way of doing it? Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?
Upvotes: 69
Views: 128636
Reputation: 23381
The result of set_index()
is a copy, so you can assign it back to df
(instead of using inplace=
parameter).
df = df.set_index(['Company', 'date'])
Note how set_index()
overwrites the old index by default. You can keep the old index by appending the new indices via the append=
parameter.
df = df.set_index(['Company', 'date'], append=True)
The new index doesn't need to come from the columns. You can pass a pandas Series or a numpy array of the same length as the dataframe to set_index()
.
new_idx = pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df = df.set_index([new_idx, 'date'])
To set a brand new MultiIndex, you can use pd.MultiIndex
object. Depending on what you use to build the index, there are convenient methods, from_arrays()
, from_tuples()
, from_product()
.
For example, if you want to create a MultiIndex from the Cartesian product of lst1
and lst2
, you can do so by calling from_product()
. Note that the length of the MultiIndex must match the length of the dataframe for this to work.
lst1 = ['a', 'b', 'c', 'd']
lst2 = [100, 200]
df.index = pd.MultiIndex.from_product([lst1, lst2])
Upvotes: 6
Reputation: 375845
When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.
is_none = df.set_index(['Company', 'date'], inplace=True)
df # the dataframe you want
is_none # has the value None
so when you have a line like:
df = df.set_index(['Company', 'date'], inplace=True)
it first modifies df
... but then it sets df
to None!
That is, you should just use the line:
df.set_index(['Company', 'date'], inplace=True)
Upvotes: 99