Merge multiple Json strings with same ID

Question

I have a Json file that is structured like below[1], as you can see multiple keywords are attached to one newspaper article. I want to normalize the Json into a such a structure(DataFrame)[2]. Ive tried it with json_normalize but that didnt worked out as intended, also did some multiindexing but i cant save the results in csv formats and it makes everything more complex. What i want is to get the data in a structure to analyze it and label the whole article based on the extracted keywords as positive or negative.

    [2]
    ╔═══════════════╦════════════╦═══════════════╗
    ║ url           ║ date       ║ entities.name ║
    ║ http://ww.... ║ 2018-12-31 ║ 2018          ║
    ║ --------------║------------║ Bill Cosby    ║
    ║               ║            ║ Actress       ║
    ║               ║            ║ ...           ║
    ╚═══════════════╩════════════╩═══════════════╝  




 [1]
{'lang': 'ENGLISH',
        'date': '2018-12-31T23:46:18Z',
     'url': 'http://www.newschannel6now.com/2018/12/31/cosby-kanye-box-office-diversity-biggest-entertainment-stories/',
     'entities': [{'avgSalience': 1,
       'wikipediaEntry': '2018',
       'type': 'DATE',
       'numMentions': 4,
       'name': '2018',
       'nameNorm': '2018'},
      {'wikipediaEntry': 'Actor',
       'type': 'COMMON',
       'numMentions': 4,
       'avgSalience': 0.72,
       'nameNorm': 'actres',
       'name': 'Actress'},
      {'wikipediaEntry': 'Bill Cosby',
       'type': 'PROPER',
       'numMentions': 2,
       'avgSalience': 0.57,
       'nameNorm': 'bill cosby',
       'name': 'Bill Cosby'},
    {'name': 'music superstar',
       'nameNorm': 'music superstar',
       'avgSalience': 0.02,
       'type': 'COMMON',
       'numMentions': 1}]}

EDIT

I managed by using group by and joining the values into one single column:

df.groupby(['url','date'], as_index=False).agg({
    'name': lambda x: ', '.join(x),
    'numMentions': lambda x: ', '.join(map(str,x)),
    'avgSalience':lambda x: ', '.join(map(str,x))
})

Merge multiple Json strings with same ID

EDIT

Answers (1)

Related Questions