Why is Pandas Read_JSON returning DataFrame with only one column

Question

I have a json file that is formatted as follows (json_test.json):

{"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}

To me, this looks like the orient "columns" that pandas specifies in their documentation: 'columns' : dict like {column -> {index -> value}}

However, running my json through pd.read_json only returns 1 column with 4 rows.

I.e.:

df2 = pd.read_json("data\json_test.json")
df2.info()


Index: 4 entries, 0 to N
Data columns (total 1 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Col1;Col2;Col3;Col4;Col5  4 non-null      object
dtypes: object(1)
memory usage: 64.0+ bytes

Can anyone help me understand what is going on here, and how to properly read in this json file? I am not really familiar with json and most examples I've seen online are for very standardized json formats.

Thank you!

Rob Raymond · Accepted Answer

you have JSON as the overall structure
within the JSON keys and values you have semi-colon delimited pairs
this can easily be fully decoded by
1. initailise a data frame with pd.DataFrame() with the JSON
2. expand the delimited keys and values using split(";")
3. convert these lists into pd.Series to then have a dataframe with columns and values

d = {"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}
df = pd.DataFrame(d)

df2 = df.iloc[:,0].apply(lambda s: pd.Series(s.split(";"), index=df.columns[0].split(";")))

df2

	Col1	Col2	Col3	Col4	Col5
0	value	value	value	value	value
1	value	value	value	value	value
2	value	value	value	value	value
N	value	value	value	value	value

Why is Pandas Read_JSON returning DataFrame with only one column

Answers (1)

Related Questions