Reputation: 1644
I'm trying to read in a dataframe created via df.to_json()
via pd.read_json
but I'm getting a ValueError
. I think it may have to do with the fact that the index is a MultiIndex but I'm not sure how to deal with that.
The original dataframe of 55k rows is called psi
and I created test.json
via:
psi.head().to_json('test.json')
Hereis the output of print psi.head().to_string()
if you want to use that.
When I do it on this small set of data (5 rows), I get a ValueError
.
! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json
import pandas as pd
pd.read_json('test.json')
Here's the full stack:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-1de2f0e65268> in <module>()
1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>)
2 import pandas as pd
----> 3 pd.read_json('test.json')
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
197 keep_default_dates, numpy, precise_float,
--> 198 date_unit).parse()
199
200 if typ == 'series' or obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
264
265 else:
--> 266 self._parse_no_numpy()
267
268 if self.obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
481 if orient == "columns":
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
485 decoded = dict((str(k), v)
ValueError: No ':' found when decoding object value
> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy()
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
But when I do it on the whole dataframe (55k rows) then I get an invalid pointer error and the IPython kernel dies. Any ideas?
EDIT: added how the json was generated in the first place.
Upvotes: 12
Views: 9735
Reputation: 33
this was my simple dirty fix for encoding/decoding multiindex pandas dataframe which seems to also work for datetime in index/columns... not optimized!
here is the encoder to json - I encoder the dataframe, index and columns into a dict to create a json
import json
import pandas as pd
def to_json_multiindex(df):
dfi = df.index.to_frame()
dfc = df.columns.to_frame()
d = dict(
df = df.to_json(),
di = dfi.to_json(),
dc = dfc.to_json()
)
return json.dumps(d)
meanwhile here is the decoder which reads the json dict and re-creates the dataframe
def read_json_multiindex(j):
d = json.loads(j)
di=pd.read_json(d['di'])
if di.shape[1]>1:
di = pd.MultiIndex.from_frame(di)
else:
_name = di.columns[0]
di = di.index
di.name = _name
dc=pd.read_json(d['dc'])
if dc.shape[1]>1:
dc = pd.MultiIndex.from_frame(dc)
else:
_name = dc.columns[0]
dc = dc.index
dc.name = _name
df = pd.read_json(d['df']).values
return pd.DataFrame(
data=df,
index=di,
columns=dc,
)
and here is a test for multiindex columns and index... seems to preserve the dataframe. Couple of issues 1) probably inefficient and 2) does seem to work for datatime in multiindex (but works when it isn't multiindex)
df = pd.DataFrame(
data = [[0,1,2],[2,3,4],[5,6,7]],
index = pd.MultiIndex.from_tuples(
(('aa','bb'),('aa','cc'),('bb','cc')
),
names=['AA','BB']),
columns = pd.MultiIndex.from_tuples(
(('XX','YY'),('XX','ZZ'),('YY','ZZ')
),
names=['YY','ZZ'])
)
j = to_json_multiindex(df)
d = read_json_multiindex(j)
Upvotes: 0
Reputation: 973
if you want to return MultiIndex structure:
# save MultiIndex indexes names
indexes_names = df.index.names
df.reset_index().to_json('dump.json')
# return back MultiIndex structure:
loaded_df = pd.read_json('dump.json').set_index(indexes_names)
Upvotes: -1
Reputation: 3297
Or you can just write json with orient = 'table'
df.to_json(path_or_buf='test.json', orient='table')
read multi_index json
pd.read_json('test.json', orient='table')
Upvotes: 2
Reputation: 128988
This is not implemented ATM, see the issue here: https://github.com/pydata/pandas/issues/4889.
You can simply reset the index first, e.g
df.reset_index().to_json(...)
and it will work.
Upvotes: 10