sstevan
sstevan

Reputation: 487

Python pandas to_json() invalid format

I'm having trouble with JSON string output. I'm using tab separated CSV file and it looks like this:

date        time        loc_id  country name    sub1_id sub2_id type
2014-09-11  00:00:01    179     US      acmnj   269     382     ico 
2014-09-11  00:00:01    179     US      acmnj   269     382     ico 
2014-09-11  00:00:01    179     GB      acmnj   269     382     ico 
2014-09-11  00:00:01    179     US      acmnj   269     382     ico 
2014-09-11  00:00:02    179     GB      acmnj   269     383     ico 
2014-09-11  00:00:02    179     JP      acmnj   269     383     ico 

Code looks like this:

df = pd.read_csv('log.csv',sep='\t',encoding='utf-16')
count = df.groupby(['country','name','sub1_id','sub2_id','type']).size()
print(count.order(na_position='last',ascending=False).to_frame().to_json(orient='index'))

Output looks like this (first few lines):

{"["US","acmnj",269,383,"ico"]":{"0":76174},"["US","acmnj",269,382,"ico"]":{"0":73609},"["IT","acmnj",269,383,"ico"]":{"0":54211},"["IT","acmnj",269,382,"ico"]":{"0":52398},"["GB","acmnj",269,383,"ico"]":{"0":41346},"["GB","acmnj",269,382,"ico"]":{"0":40140},"["US","acmnj",269,405,"ico"]":{"0":39482},"["US","acmnj",269,400,"ico"]":{"0":39303},"["US","popcdd",178,365,"ico"]":{"0":33168},"["IT","acmnj",269,400,"ico"]":{"0":33026},"["IT","acmnj",269,405,"ico"]":{"0":32824},"["IT","achrfb141",141,42,"ico"]":{"0":26986},"["GB","acmnj",269,405,"ico"]":{"0":25895},"["IN","acmnj",269,383,"ico"]":{"0":25647},"["GB","acmnj",269,400,"ico"]":{"0":25488...

I want to load this output in PHP but i get NULL when I'm trying to decode this. I used JSON Validator to check string and it was invalid. I also tried without orient parameter but I get invalid JSON format.

Upvotes: 2

Views: 2046

Answers (1)

trvrm
trvrm

Reputation: 814

This does seem to be a problem with Pandas. I reproduced your error.

DataFrame.to_json can take several different orient arguments: 'split', 'records', 'index', 'columns' and 'values'.

In your case, it seems like 'split', 'records' and 'values' work, but 'index' and 'columns' doesn't.

You can quickly test this in python using the json module:

df = pd.read_csv('log.csv',sep='\t',encoding='utf-16')
count = df.groupby(['country','name','sub1_id','sub2_id','type']).size()
f=count.order(ascending=False).to_frame()
json.loads(f.to_json(orient='index'))  # This failed for me
json.loads(f.to_json(orient='records')) #This worked

Upvotes: 4

Related Questions