Convert string representation of multiIndex pandas into multiIndex pandas in python

Question

I have a string representation of a multiIndex below.

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
df = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = str(df)

I would like to convert string represented df back into a pandas multiIndex class. Are there any direct functions available in pandas for the same?

Excepted output:

print(df)
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
       labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
       names=['first', 'second'])

Thanks in advance.

Matthias Fripp · Accepted Answer

The string representation of the MultiIndex is nearly executable code, so you could evaluate it with eval, like this:

eval(df, {}, {'MultiIndex': pd.MultiIndex})
# MultiIndex(levels=[[u'bar', u'baz', u'foo', u'qux'], [u'one', u'two']],
#        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
#        names=[u'first', u'second'])

Just be careful that you have control of the string you pass to eval, since it could be used to crash your computer and/or run arbitrary code (see here and here).

Alternatively, here's a safe and simple but somewhat brittle way to do this:

import ast
# convert df into a literal string defining a dictionary
dfd = (
    "{" + df[11:-1] + "}"
        .replace("levels=", "'levels':")
        .replace("labels=", "'labels':")
        .replace("names=", "'names':") 
)
# convert it safely into an actual dictionary
args = ast.literal_eval(dfd)
# use the dictionary as arguments to pd.MultiIndex
pd.MultiIndex(**args)

With this code, there's no way for arbitrary strings to crash your computer, since ast.literal_eval() doesn't allow any operators, just literal expressions.

Here's a version that's safe and doesn't require pre-specifying the argument names, but it's more complex:

import ast, tokenize
from cStringIO import StringIO
tokens = [  # make a list of mutable tokens
    list(t) 
    for t in tokenize.generate_tokens(StringIO('{' + df[11:-1] + '}').readline)
]
for t, next_t in zip(tokens[:-1], tokens[1:]):
    # convert `identifier=` to `'identifier':`
    if t[0] == 1 and next_t[0] == 51 and next_t[1] == '=':
        t[0] = 3                  # switch type to quoted string
        t[1] = "'" + t[1] + "'"   # put quotes around identifier
        next_t[1] = ':'           # convert '=' to ':' 
args = ast.literal_eval(tokenize.untokenize(tokens))
pd.MultiIndex(**args)

Note that this code will raise an exception if df is malformed or contains 'identifier=...' as code (not inside strings) at lower levels. But I don't think that can happen with str(MultiIndex). If that is an issue, you could generate an ast tree for the original df string, then extract the arguments and convert those programmatically into a literal definition for a dict ({x: y}, not dict(x=y)), then use ast.literal_eval to evaluate that.

Convert string representation of multiIndex pandas into multiIndex pandas in python

Answers (1)

Related Questions