user8149657
user8149657

Reputation: 553

How to convert JSON data inside a pandas column into new columns

I have this short version of ADSB json data and would like to convert it into dataFrame columns as Icao, Alt, Lat, Long, Spd, Cou.....

After Alperen told me to do this

df = pd.read_json('2016-06-20-2359Z.json', lines=True), 

I can load it into a DataFrame. However, df.acList is

[{'Id': 10537990, 'Rcvr': 1, 'HasSig': False, ...
Name: acList, dtype: object

How can I get the Icao, Alt, Lat, Long, Spd, Cou data?

 "src":1,
   "feeds":[  
      {  
         "id":1,
         "name":"ADSBexchange.com",
         "polarPlot":false
      }
   ],
   "srcFeed":1,
   "showSil":true,
   "showFlg":true,
   "showPic":true,
   "flgH":20,
   "flgW":85,
   "acList":[  
      {  
         "Id":11281748,
         "Rcvr":1,
         "HasSig":false,
         "Icao":"AC2554",
         "Bad":false,
         "Reg":"N882AS",
         "FSeen":"\/Date(1466467166951)\/",
         "TSecs":3,
         "CMsgs":1,
         "AltT":0,
         "Tisb":false,
         "TrkH":false,
         "Type":"CRJ2",
         "Mdl":"2001 
BOMBARDIER INC 
 CL-600-2B19",
         "Man":"Bombardier",
         "CNum":"7503",
         "Op":"EXPRESSJET AIRLINES INC - ATLANTA, GA",
         "OpIcao":"ASQ",
         "Sqk":"",
         "VsiT":0,
         "WTC":2,
         "Species":1,
         "Engines":"2",
         "EngType":3,
         "EngMount":1,
         "Mil":false,
         "Cou":"United States",
         "HasPic":false,
         "Interested":false,
         "FlightsCount":0,
         "Gnd":false,
         "SpdTyp":0,
         "CallSus":false,
         "TT":"a",
         "Trt":1,
         "Year":"2001"
      },
      {  
         "Id":11402205,
         "Rcvr":1,
         "HasSig":true,
         "Sig":110,
         "Icao":"ADFBDD",
         "Bad":false,
         "FSeen":"\/Date(1466391940977)\/",
         "TSecs":75229,
         "CMsgs":35445,
         "Alt":8025,
         "GAlt":8025,
         "AltT":0,
         "Call":"TEST1234",
         "Tisb":false,
         "TrkH":false,
         "Sqk":"0262",
         "Help":false,
         "VsiT":0,
         "WTC":0,
         "Species":0,
         "EngType":0,
         "EngMount":0,
         "Mil":true,
         "Cou":"United States",
         "HasPic":false,
         "Interested":false,
         "FlightsCount":0,
         "Gnd":true,
         "SpdTyp":0,
         "CallSus":false,
         "TT":"a",
         "Trt":1
      }
   ],
   "totalAc":4231,
   "lastDv":"636019887431643594",
   "shtTrlSec":61,
   "stm":1466467170029
}
</pre>

Upvotes: 47

Views: 79820

Answers (5)

Felipe Augusto
Felipe Augusto

Reputation: 8184

In my case I had some missing values (None) then I created a more specific code that also drops the original column after creating the new ones:

for prefix in ['column1', 'column2']:
    df_temp = df[prefix].apply(lambda x: {} if pd.isna(x) else x)
    df_temp = pd.io.json.json_normalize(df_temp)
    df_temp = df_temp.add_prefix(prefix + '_')
    df.drop([prefix], axis=1, inplace=True)
    df = pd.concat([df, df_temp], axis = 1, sort=False)

Upvotes: 0

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25189

If you already have your data in acList column in a pandas DataFrame, simply do:

import pandas as pd
pd.io.json.json_normalize(df.acList[0])

Alt AltT    Bad CMsgs   CNum    Call    CallSus Cou EngMount    EngType ... Sqk TSecs   TT  Tisb    TrkH    Trt Type    VsiT    WTC Year
0   NaN 0   False   1   7503    NaN False   United States   1   3   ...     3   a   False   False   1   CRJ2    0   2   2001
1   8025.0  0   False   35445   NaN TEST1234    False   United States   0   0   ... 0262    75229   a   False   False   1   NaN 0   0   NaN

Since pandas 1.0 the imports should be:

import pandas as pd
pd.json_normalize(df.acList[0])

Upvotes: 50

Jonne Kleijer
Jonne Kleijer

Reputation: 691

Since pandas 1.0, json_normalize is available in the top-level namespace. Therefore use:

import pandas as pd
pd.json_normalize(df.acList[0])

Upvotes: 8

blockw
blockw

Reputation: 39

I can't comment yet on ThinkBonobo's answer but in case the JSON in the column isn't exactly a dictionary you can keep doing .apply until it is. So in my case

import json
import pandas as pd

json_normalize(
    df
    .theColumnWithJson
    .apply(json.loads)
    .apply(lambda x: x[0]) # the inner JSON is list with the dictionary as the only item
)

Upvotes: 3

ThinkBonobo
ThinkBonobo

Reputation: 16515

@Sergey's answer solved the issue for me but I was running into issues because the json in my data frame column was kept as a string and not as an object. I had to add the additional step of mapping the column:

import json
import pandas as pd
pd.io.json.json_normalize(df.acList.apply(json.loads))

Upvotes: 25

Related Questions