Reputation: 642
I have a json which comprises like this.
[
{
"bi-document-identifier": "USA",
"text": "I found somwthing Interesting",
"relation_type": "ImportanceRelation",
"annotation_type": "Opinionholder",
"relation_start": {
"annotatedtext": "We",
"annotatedtext_represents": "OpinionHolder",
"annotatedtext_startposition": 146,
"annotatedtext_endposition": 148
},
"relation_end": {
"annotatedtext": "information",
"annotatedtext_represents": "StanceTarget",
"annotatedtext_startposition": 239,
"annotatedtext_endposition": 250
}
},
{
"bi-document-identifier": "USB",
"text": "This is quite new to me",
"relation_type": "ImportanceRelation",
"annotation_type": "Opinionholder",
"relation_start": {
"annotatedtext": "Dr. A_PERSON",
"annotatedtext_represents": "OpinionHolder",
"annotatedtext_startposition": 0,
"annotatedtext_endposition": 12
},
"relation_end": {
"annotatedtext": "MOA of CV risk reduction",
"annotatedtext_represents": "StanceTarget",
"annotatedtext_startposition": 29,
"annotatedtext_endposition": 53
}
}
]
how to read as a multi header dataframe. The orient and typ options in pandas.read_json() give me some errors like below
TypeError: list indices must be integers or slices, not str
Upvotes: 1
Views: 1388
Reputation: 30991
Start from read_json with records orientation:
df = pd.read_json('Input.json', orient='records')
Then convert both relation columns to "MultiIndexed" version:
d = { tt: df[tt].apply(pd.Series) for tt in ['relation_start', 'relation_end'] }
wrk1 = pd.concat(d, axis=1, keys=d.keys())
The added MultiIndex level is the top level (relation_start and relation_end) and index keys in source values become the second level.
The third step is to convert 4 initial columns, also to MultiIndex version, but this time the added level is the second level, containing only empty strings:
wrk2 = df.iloc[:, 0:4]
wrk2.columns = pd.MultiIndex.from_product([wrk2.columns, ['']])
And the last thing to do is to contatenate both wrk DataFrames (on the index):
result = wrk2.join(wrk1)
Or if you want a shorter code, don't create wrk1 (d is enough), and place corresponding expression in the final instruction:
result = wrk.join(pd.concat(d, axis=1, keys=d.keys()))
Note: json_normalize
proposed in the other solution is undeniably shorter,
but the result has plain (single level) column index.
As I understood, you want just a MultiIndex on columns.
Upvotes: 1
Reputation: 1441
Why don't you use pandas' json_normalize
?
from pandas import json_normalize
df = json_normalize(data)
df.head(2)
Otherwise, this seems to cover your specific usecase:
Creating a multi-header table from JSON
Upvotes: 2