user3527975
user3527975

Reputation: 1773

Set MultiIndex of an existing DataFrame in pandas

I have a DataFrame that looks like

  Emp1    Empl2           date       Company
0    0        0     2012-05-01         apple
1    0        1     2012-05-29         apple
2    0        1     2013-05-02         apple
3    0        1     2013-11-22         apple
18   1        0     2011-09-09        google
19   1        0     2012-02-02        google
20   1        0     2012-11-26        google
21   1        0     2013-05-11        google

I want to pass the company and date for setting a MultiIndex for this DataFrame. Currently it has a default index. I am using

df.set_index(['Company', 'date'], inplace=True)

But when I print, it prints None. Is this not the correct way of doing it? Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

Upvotes: 69

Views: 128636

Answers (2)

cottontail
cottontail

Reputation: 23381

The result of set_index() is a copy, so you can assign it back to df (instead of using inplace= parameter).

df = df.set_index(['Company', 'date'])

res1


Note how set_index() overwrites the old index by default. You can keep the old index by appending the new indices via the append= parameter.

df = df.set_index(['Company', 'date'], append=True)

res2


The new index doesn't need to come from the columns. You can pass a pandas Series or a numpy array of the same length as the dataframe to set_index().

new_idx = pd.Series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df = df.set_index([new_idx, 'date'])

res3


To set a brand new MultiIndex, you can use pd.MultiIndex object. Depending on what you use to build the index, there are convenient methods, from_arrays(), from_tuples(), from_product().

For example, if you want to create a MultiIndex from the Cartesian product of lst1 and lst2, you can do so by calling from_product(). Note that the length of the MultiIndex must match the length of the dataframe for this to work.

lst1 = ['a', 'b', 'c', 'd']
lst2 = [100, 200]
df.index = pd.MultiIndex.from_product([lst1, lst2])

res5

Upvotes: 6

Andy Hayden
Andy Hayden

Reputation: 375845

When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.

is_none = df.set_index(['Company', 'date'], inplace=True)
df  # the dataframe you want
is_none # has the value None

so when you have a line like:

df = df.set_index(['Company', 'date'], inplace=True)

it first modifies df... but then it sets df to None!

That is, you should just use the line:

df.set_index(['Company', 'date'], inplace=True)

Upvotes: 99

Related Questions