Worst SQL Noob
Worst SQL Noob

Reputation: 189

ValueError: DataFrame constructor not properly called when convert Json data to Dataframe

I meet a problem when I try to covert Json data to Dataframe using pandas and json package.

My raw data from Json file looks like:

{"Number":41,"Type":["A1","A2","A3","A4","A5"],"Percent":{"Very Good":1.2,"Good":2.1,"OK":1.1,"Bad":1.3,"Very Bad":1.7}}

And my code is:

import pandas as pd
import json

with open('Test.json', 'r') as filename:
    json_file=json.load(filename)
    df =pd.DataFrame(json_file['Type'],columns=['Type'])

The problem is when I only read Type from Json file, it gives me the correct result which looks like:

   Type
0   A1
1   A2
2   A3
3   A4
4   A5

However when only read Number from Json file:

 df =pd.DataFrame(json_file['Number'],columns=['Number'])

It gives me error: ValueError: DataFrame constructor not properly called!

Also If I use:

df = pd.DataFrame.from_dict(json_file)

I get the error:

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

I have done some researches on Google, but still cannot figure out why.

My goal is to break this Json data into two Dataframe, the first one is combine the Number and Type together:

   Number  Type
0    41     A1
1    41     A2
2    41     A3
3    41     A4
4    41     A5

Another Dataframe I want to get is the data in the Percent, which may look like:

Very Good  1.2
Good 2.1
OK 1.1
Bad 1.3
Very Bad 1.7

Upvotes: 2

Views: 8372

Answers (1)

user3212593
user3212593

Reputation: 496

This should give your desired output:

with open('Test.json', 'r') as filename:
    json_file=json.load(filename)
    df = pd.DataFrame({'Number': json_file.get('Number'), 'Type': json_file.get('Type')})
    df2 = pd.DataFrame({'Percent': json_file.get('Percent')})

   Number type
0      41   A1
1      41   A2
2      41   A3
3      41   A4
4      41   A5

           Percent
Bad            1.3
Good           2.1
OK             1.1
Very Bad       1.7
Very Good      1.2

You can generalize this into a function:

def json_to_df(d, ks):
    return pd.DataFrame({k: d.get(k) for k in ks})

df = json_to_df(json_file, ['Number', 'Type'])

If you wish to avoid using the json package, you can do it directly in pandas:

_df = pd.read_json('Test.json', typ='series')
df = pd.DataFrame.from_dict(dict(_df[['Number', 'Type']]))
df2 = pd.DataFrame.from_dict(_df['Percent'], orient='index', columns=['Percent'])

Upvotes: 2

Related Questions