Jason Sun
Jason Sun

Reputation: 73

How to transform nested dictionary from BeautifulSoup to pandas dataframe

I used BeautifulSoup to get this data. It seems like a nested dictionary but I failed to transform them into dataframe. The type is .

{"page":1,"rows":[{"id":"160128","cell":{"fund_id":"160128","bond_ratio":"132.04","report_dt":"2019-12-31","is_outdate":false,"maturity_dt_tips":""}},{"id":"160130","cell":{"fund_id":"160130","bond_ratio":"165.29","report_dt":"2019-12-31","is_outdate":false,"maturity_dt_tips":""}},{"id":"160131","cell":{"fund_id":"160131","bond_ratio":"94.93","report_dt":"2019-12-31","is_outdate":false,"maturity_dt_tips":""}}],"total":3}

How should I get 'values' under each key of 'cell'? Thanks.

    fund_id     bond_ratio     report_dt     is_outdate     maturity_dt_tips
0    160128         132.04    2019-12-31          false
1    160130         165.29    2019-12-31          false
2    160131          94.93    2019-12-31          false

Upvotes: 2

Views: 292

Answers (1)

jezrael
jezrael

Reputation: 863351

Use json.json_normalize:

d = {"page":1,"rows":[{"id":"160128","cell":{"fund_id":"160128","bond_ratio":"132.04","report_dt":"2019-12-31","is_outdate":False,"maturity_dt_tips":""}},{"id":"160130","cell":{"fund_id":"160130","bond_ratio":"165.29","report_dt":"2019-12-31","is_outdate":False,"maturity_dt_tips":""}},{"id":"160131","cell":{"fund_id":"160131","bond_ratio":"94.93","report_dt":"2019-12-31","is_outdate":False,"maturity_dt_tips":""}}],"total":3}

from pandas.io.json import json_normalize

df = json_normalize(d['rows']) 
print (df)
       id cell.fund_id cell.bond_ratio cell.report_dt  cell.is_outdate  \
0  160128       160128          132.04     2019-12-31            False   
1  160130       160130          165.29     2019-12-31            False   
2  160131       160131           94.93     2019-12-31            False   

  cell.maturity_dt_tips  
0                        
1                        
2            

Then if necessary remove values before . in columns names add str.split and indexing last values of lists by [-1]:

df.columns = df.columns.str.split('.').str[-1]
print (df)
       id fund_id bond_ratio   report_dt  is_outdate maturity_dt_tips
0  160128  160128     132.04  2019-12-31       False                 
1  160130  160130     165.29  2019-12-31       False                 
2  160131  160131      94.93  2019-12-31       False                 

Upvotes: 3

Related Questions