Abhishek Thakur
Abhishek Thakur

Reputation: 17005

convert list of pandas dataframe to json

I have a list of pandas dataframes df_list. In that list there are 4 dataframes which look alike in terms of column names and number of rows.

one example dataframe, at index 0 of the list:

df_list[0]:

str_col          vals1               vals2
hi             [1,2,3,5]      [0.4,0.4,0.5, 0.1]
hagha          [1,3,3,5]      [0.1,0.4,0.5, 0.1]
me             [4,4,3,5]      [0.9,0.4,0.5, 0.1]
i know         [1,4,3,5]      [0.1,0.7,0.9, 0.0]

In all dataframe the str_col column remains the same. it does not change. only the vals1 and vals2 change. vals1 and vals2 are lists. str_col is string.

What i want in the end is a json from the list of dataframes:

hi
       - list index 0: 
               - vals1
               - vals2
       - list index 1: 
               - vals1
               - vals2
       - list index 2: 
               - vals1
               - vals2
       - list index 3: 
               - vals1
               - vals2
       - list index 4: 
               - vals1
               - vals2

hagha
       - list index 0: 
               - vals1
               - vals2
       - list index 1: 
               - vals1
               - vals2
       - list index 2: 
               - vals1
               - vals2
       - list index 3: 
               - vals1
               - vals2
       - list index 4: 
               - vals1
               - vals2

and so on

better version below: (please dont compare values of vals1 and vals2 with the example above).

{
    "final_json": {
        "hi": {
            "0": {
                "vals1": [
                    1,
                    2,
                    3,
                    5
                ],
                "vals2": [
                    0.4,
                    0.4,
                    0.5,
                    0.1
                ]
            },
            "1": {
                "vals1": [
                    8,
                    5,
                    5,
                    3
                ],
                "vals2": [
                    1,
                    0,
                    0,
                    0
                ]
            },
            "2": {
                "vals1": [
                    2,
                    3,
                    6,
                    7
                ],
                "vals2": [
                    1,
                    0,
                    0,
                    0
                ]
            },
            "3": {
                "vals1": [
                    3,
                    3,
                    5,
                    3
                ],
                "vals2": [
                    1,
                    0,
                    0,
                    0
                ]
            }
        },
        "hagha": {
            "0": {
                "vals1": [
                    6,
                    8,
                    9,
                    0
                ],
                "vals2": [
                    0.76,
                    0.76,
                    0.36,
                    0.363
                ]
            },
            "1": {
                "vals1": [
                    2,
                    3,
                    4,
                    6
                ],
                "vals2": [
                    0.63,
                    0.36,
                    0.46,
                    0.6
                ]
            },
            "2": {
                "vals1": [
                    3,
                    6,
                    5,
                    5
                ],
                "vals2": [
                    0.4,
                    0.64,
                    0.46,
                    0.456                
                    ]
            },
            "3": {
                "vals1": [
                    4,
                    6,
                    3,
                    2                
                ],
                "vals2": [
                    0.657,
                    0.675,
                    0.64,
                    0.6                
                ]
            }



 .
 .
 .
 .
 .
                ]
            }
        }
    }
}

Currently I'm looping over all the dataframes in the list and it is too slow. I would like to know if a faster solution to achieve this exists (may to_json) ?

Upvotes: 2

Views: 1715

Answers (1)

jezrael
jezrael

Reputation: 862511

import pandas as pd
import json
import pprint

I think you can first concat list with keys parameter for distinguish each DataFrame:

df = pd.concat(dfs, keys=range(len(dfs)))
#print (df)

Then remove first level of MultiIndex, groupby and create dictionaries, last convert output to dict by to_dict:

d = (df.reset_index(level=1, drop=True)
      .groupby('str_col')['vals1','vals2']
      .apply(lambda x: x.to_dict(orient='index'))
      .to_dict()      
      )
#add start of json
d = {"final_json": d}

pprint.pprint(d)
{'final_json': {'hagha': {0: {'vals1': [1, 3, 3, 5],
                              'vals2': [0.1, 0.4, 0.5, 0.1]},
                          1: {'vals1': [10, 30, 3, 5],
                              'vals2': [0.17, 0.47, 0.57, 0.17]}},
                'hi': {0: {'vals1': [1, 2, 3, 5],
                           'vals2': [0.4, 0.4, 0.5, 0.1]},
                       1: {'vals1': [10, 20, 30, 50],
                           'vals2': [0.48, 0.48, 0.58, 0.18]}},
                'i know': {0: {'vals1': [1, 4, 3, 5],
                               'vals2': [0.1, 0.7, 0.9, 0.0]},
                           1: {'vals1': [1, 4, 3, 5],
                               'vals2': [0.1, 0.7, 0.9, 0.0]}},
                'me': {0: {'vals1': [4, 4, 3, 5],
                           'vals2': [0.9, 0.4, 0.5, 0.1]},
                       1: {'vals1': [4, 4, 3, 5],
                           'vals2': [0.9, 0.4, 0.5, 0.1]}}}}

#convert to json                 
json = json.dumps(d)

Setup:

df1 = pd.DataFrame({'str_col': ['hi', 'hagha', 'me', 'i know'], 'vals1': [[1, 2, 3, 5], [1, 3, 3, 5], [4, 4, 3, 5], [1, 4, 3, 5]], 'vals2': [[0.4, 0.4, 0.5, 0.1], [0.1, 0.4, 0.5, 0.1], [0.9, 0.4, 0.5, 0.1], [0.1, 0.7, 0.9, 0.0]]})
print (df1)
#  str_col         vals1                 vals2
#0      hi  [1, 2, 3, 5]  [0.4, 0.4, 0.5, 0.1]
#1   hagha  [1, 3, 3, 5]  [0.1, 0.4, 0.5, 0.1]
#2      me  [4, 4, 3, 5]  [0.9, 0.4, 0.5, 0.1]
#3  i know  [1, 4, 3, 5]  [0.1, 0.7, 0.9, 0.0]

df2 = pd.DataFrame({'str_col': ['hi', 'hagha', 'me', 'i know'], 'vals1': [[10, 20, 30, 50], [10, 30, 3, 5], [4, 4, 3, 5], [1, 4, 3, 5]], 'vals2': [[0.48, 0.48, 0.58, 0.18], [0.17, 0.47, 0.57, 0.17], [0.9, 0.4, 0.5, 0.1], [0.1, 0.7, 0.9, 0.0]]})
print (df2)
#  str_col             vals1                     vals2
#0      hi  [10, 20, 30, 50]  [0.48, 0.48, 0.58, 0.18]
#1   hagha    [10, 30, 3, 5]  [0.17, 0.47, 0.57, 0.17]
#2      me      [4, 4, 3, 5]      [0.9, 0.4, 0.5, 0.1]
#3  i know      [1, 4, 3, 5]      [0.1, 0.7, 0.9, 0.0]

#create list of DataFrames
dfs = [df1, df2]

Upvotes: 3

Related Questions