J3319
J3319

Reputation: 25

JSON file to Pandas df

I'm trying to convert a JSON file into a pandas df to remove unwanted data and limit to a csv of ID's the data looks like this:

{
     "data": [
    {
      "message": "Uneeded message",
      "created_time": "2017-04-02T17:20:37+0000",
      "id": "723456782912449_1008262099345654"
    },
    {
      "message": "Uneeded message",
      "created_time": "2017-03-28T06:26:28+0000",
      "id": "771345678912449_1003934567871010"
    },

I've not used JSON before but the code i've used to load this data is

import pandas as pd
import json

with open('fileName.json', encoding="utf8" ) as f:
    w = json.loads(f.read(), strict=False)

The end output should just be a CSV with a column of ID's

Upvotes: 1

Views: 1527

Answers (2)

piRSquared
piRSquared

Reputation: 294278

using json.loads

setup

json_str = """{
 "data": [
        {
          "message": "Uneeded message",
          "created_time": "2017-04-02T17:20:37+0000",
          "id": "723456782912449_1008262099345654"
        },
        {
          "message": "Uneeded message",
          "created_time": "2017-03-28T06:26:28+0000",
          "id": "771345678912449_1003934567871010"
        }]}"""

solution

import json
import pandas as pd

pd.DataFrame(json.loads(json_str)['data'])

               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message

Or with the json in the file

with open('neutraluk1.json') as f:
    print(pd.DataFrame(json.load(f)['data']))

               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message

Upvotes: 1

jezrael
jezrael

Reputation: 862681

I think you need json_normalize:

from pandas.io.json import json_normalize 
import json

with open('file.json') as data_file:    
    d = json.load(data_file)

print (d)
{
    "data": [{
        "message": "Uneeded message",
        "created_time": "2017-04-02T17:20:37+0000",
        "id": "723456782912449_1008262099345654"
    }, {
        "message": "Uneeded message",
        "created_time": "2017-03-28T06:26:28+0000",
        "id": "771345678912449_1003934567871010"
    }]
}

df = json_normalize(d, 'data')
print (df)
               created_time                                id          message
0  2017-04-02T17:20:37+0000  723456782912449_1008262099345654  Uneeded message
1  2017-03-28T06:26:28+0000  771345678912449_1003934567871010  Uneeded message

Upvotes: 2

Related Questions