Sophia
Sophia

Reputation: 5

Pandas 'read_json' not working as expected

I want to load a JSON file with pandas but it is not working as I expect it to! I have referred to this stackoverflow answers but my problem is not that. The JSON file looks like this:

JSON File

Code to load the File:-

import pandas as pd
df = pd.read_json("BrowserHistory.json")
print(df)

Output:-

Output Pandas Dataframe

But I don't want only 1 Column containing each json element. I want 6 Columns namely 'favicon_url', 'page_transition', 'title', 'url', 'client_id' and 'time_usec' as describle in above photo of 'json file' and then each column should contain the value it has in every element.

Like this:

favicon url   page_transition   title   url   client_id   time_user
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .

JSON File:

{
    "Browser History": [
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386529857946
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386514845201
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386499014063
},
        {
            "favicon_url": "https://ssl.gstatic.com/ui/v1/icons/mail/rfr/gmail.ico",
            "page_transition": "LINK",
            "title": "Gmail",
            "url": "https://mail.google.com/mail/u/0/#inbox",
            "client_id": "cliendid",
            "time_usec": 1620386492788783
}
  ]
}

Upvotes: 0

Views: 2636

Answers (2)

Nick
Nick

Reputation: 147216

Because your objects are in a list at the second level down of your JSON, you can't read it directly into a dataframe using read_json. Instead, you could read the json into a variable, and then create the dataframe from that:

import pandas as pd
import json

f = open("BrowserHistory.json")
js = json.load(f)
df = pd.DataFrame(js['Browser History'])
df
#                                          favicon_url page_transition  ... client_id         time_usec
# 0                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386529857946
# 1                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386514845201
# 2                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386499014063
# 3  https://ssl.gstatic.com/ui/v1/icons/mail/rfr/g...            LINK  ...  cliendid  1620386492788783

Note you may need to specify the file encoding on the open call e.g.

f = open("BrowserHistory.json", encoding="utf8")

Upvotes: 2

Bruno Mello
Bruno Mello

Reputation: 4618

The problem is because of the {} that are around your file, pandas thinks that the first level of the JSON are the columns and thus it uses just Browser History as a column. You can use this code to solve your problem:

import pandas as pd
df = pd.DataFrame(json.load(open('BrowserHistory.json', encoding='cp850'))['Browser History'])
print(df)

Upvotes: 3

Related Questions