dmvianna
dmvianna

Reputation: 15718

Creating an empty MultiIndex

I would like to create an empty DataFrame with a MultiIndex before assigning rows to it. I already found that empty DataFrames don't like to be assigned MultiIndexes on the fly, so I'm setting the MultiIndex names during creation. However, I don't want to assign levels, as this will be done later. This is the best code I got to so far:

def empty_multiindex(names):
    """
    Creates empty MultiIndex from a list of level names.
    """
    return MultiIndex.from_tuples(tuples=[(None,) * len(names)], names=names)

Which gives me

In [2]:

empty_multiindex(['one','two', 'three'])

Out[2]:

MultiIndex(levels=[[], [], []],
           labels=[[-1, -1, -1], [-1, -1, -1], [-1, -1, -1]],
           names=[u'one', u'two', u'three'])

and

In [3]:
DataFrame(index=empty_multiindex(['one','two', 'three']))

Out[3]:
one two three
NaN NaN NaN

Well, I have no use for these NaNs. I can easily drop them later, but this is obviously a hackish solution. Anyone has a better one?

Upvotes: 56

Views: 35225

Answers (4)

RoG
RoG

Reputation: 849

The solution is to leave out the labels. This works fine for me:

>>> import pandas as pd
>>> my_index = pd.MultiIndex(levels=[[],[],[]],
...                          codes=[[],[],[]],
...                          names=[u'one', u'two', u'three'])
>>> my_index
MultiIndex([], names=['one', 'two', 'three'])
>>> my_columns = [u'alpha', u'beta']
>>> df = pd.DataFrame(index=my_index, columns=my_columns)
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three
apple banana cherry   0.1  0.2

For Pandas Version < 0.25.1: The keyword labels can be used in place of codes

Upvotes: 58

ronkov
ronkov

Reputation: 1583

Using pd.MultiIndex.from_tuples may be more straightforward.

import pandas as pd
ind = pd.MultiIndex.from_tuples([], names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]
df

                      alpha beta
one     two     three       
apple   banana  cherry    4    3

Upvotes: 15

mcsoini
mcsoini

Reputation: 6642

Using pd.MultiIndex.from_arrays allows for a slightly more concise solution when defining the index explicitly:

import pandas as pd
ind = pd.MultiIndex.from_arrays([[]] * 3, names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]

                     alpha  beta
one   two    three              
apple banana cherry      4     3

Upvotes: 4

Jean Paul
Jean Paul

Reputation: 1578

Another solution which is maybe a little simpler is to use the function set_index:

>>> import pandas as pd
>>> df = pd.DataFrame(columns=['one', 'two', 'three', 'alpha', 'beta'])
>>> df = df.set_index(['one', 'two', 'three'])
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three            
apple banana cherry   0.1  0.2

Upvotes: 40

Related Questions