Reputation: 81
I need to create a pandas dataframe from a list of nested dictionaries .Below is my dictionary:
[
{
"id": "e182_1234",
"stderr": {
"type": "stderr",
"upload time": "Thu Jun 25 12:24:52 +0100 2020",
"length": 3000,
"contents": [
{
"date": "20/06/25",
"time": "12:19:39",
"type": "ERROR",
"text": "Exception found\njava.io.Exception:Not initated\n at.apache.java.org........",
"line_start": 12,
"line_end": 15
},
{
"date": "20/06/25",
"time": "12:20:41",
"type": "WARN",
"text": "Warning as the node is accessed without started",
"line_start": 17,
"line_end": 17
}
]
}
}
]
I tried to create a dataframe using the below code:
df=pd.DataFrame(filtered_data) #filtered_data is the above dictionary
res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))
Result I got :
#df=pd.DataFrame(filtered_data)
id stderr
0 e182_1234 {'type': 'stderr', 'upload time': 'Thu Jun 25 ...
#res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
id type upload time length contents
0 e182_1234 stderr Thu Jun 25 12:24:52 +0100 2020 3000 [{'date': '20/06/25', 'time': '12:19:39', 'typ...
#res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))
id type upload time length 0 1
0 e182_1234 stderr Thu Jun 25 12:24:52 +0100 2020 3000 {'date': '20/06/25', 'time': '12:19:39', 'type... {'date': '20/06/25', 'time': '12:20:41', 'type...
As you can when I split those list of directories it comes with the column name 0
and 1
. I want those columns to be separated like date,time,type,text,line_start,line_end
as separate columns.
Expected Output:
id type upload time length date time type text line_start line_end
0 e182_1234 stderr Thu Jun 25 12:24:52 +0100 2020 3000 20/06/25 12:19:39 ERROR Exception found\njava.io.Exception:Not initated\n at.apache.java.org........ 12 15
1 e182_1234 stderr Thu Jun 25 12:24:52 +0100 2020 3000 20/06/25 12:20:41 WARN WARN Warning as the node is accessed without started 17 17
How to sort this issue ? Thanks in advance!
Upvotes: 0
Views: 86
Reputation: 7604
You can use json_normalize
for this:
with open('1.json', 'r+') as f:
data = json.load(f)
df = pd.json_normalize(data, record_path=['stderr', 'contents'], meta=[['id'], ['stderr', 'type']])
print(df)
date time type text line_start line_end id stderr.type
0 20/06/25 12:19:39 ERROR Exception found\njava.io.Exception:Not initate... 12 15 e182_1234 stderr
1 20/06/25 12:20:41 WARN Warning as the node is accessed without started 17 17 e182_1234 stderr
Upvotes: 2