Reputation: 467
I have a string representation of a multiIndex below.
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
df = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = str(df)
I would like to convert string represented df back into a pandas multiIndex class. Are there any direct functions available in pandas for the same?
Excepted output:
print(df)
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
names=['first', 'second'])
Thanks in advance.
Upvotes: 1
Views: 296
Reputation: 18625
The string representation of the MultiIndex is nearly executable code, so you could evaluate it with eval
, like this:
eval(df, {}, {'MultiIndex': pd.MultiIndex})
# MultiIndex(levels=[[u'bar', u'baz', u'foo', u'qux'], [u'one', u'two']],
# labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
# names=[u'first', u'second'])
Just be careful that you have control of the string you pass to eval
, since it could be used to crash your computer and/or run arbitrary code (see here and here).
Alternatively, here's a safe and simple but somewhat brittle way to do this:
import ast
# convert df into a literal string defining a dictionary
dfd = (
"{" + df[11:-1] + "}"
.replace("levels=", "'levels':")
.replace("labels=", "'labels':")
.replace("names=", "'names':")
)
# convert it safely into an actual dictionary
args = ast.literal_eval(dfd)
# use the dictionary as arguments to pd.MultiIndex
pd.MultiIndex(**args)
With this code, there's no way for arbitrary strings to crash your computer, since ast.literal_eval()
doesn't allow any operators, just literal expressions.
Here's a version that's safe and doesn't require pre-specifying the argument names, but it's more complex:
import ast, tokenize
from cStringIO import StringIO
tokens = [ # make a list of mutable tokens
list(t)
for t in tokenize.generate_tokens(StringIO('{' + df[11:-1] + '}').readline)
]
for t, next_t in zip(tokens[:-1], tokens[1:]):
# convert `identifier=` to `'identifier':`
if t[0] == 1 and next_t[0] == 51 and next_t[1] == '=':
t[0] = 3 # switch type to quoted string
t[1] = "'" + t[1] + "'" # put quotes around identifier
next_t[1] = ':' # convert '=' to ':'
args = ast.literal_eval(tokenize.untokenize(tokens))
pd.MultiIndex(**args)
Note that this code will raise an exception if df
is malformed or contains 'identifier=...' as code (not inside strings) at lower levels. But I don't think that can happen with str(MultiIndex)
. If that is an issue, you could generate an ast
tree for the original df
string, then extract the arguments and convert those programmatically into a literal definition for a dict
({x: y}
, not dict(x=y)
), then use ast.literal_eval
to evaluate that.
Upvotes: 2