Reputation: 17005
I have a list of pandas dataframes df_list
. In that list there are 4 dataframes which look alike in terms of column names and number of rows.
one example dataframe, at index 0 of the list:
df_list[0]
:
str_col vals1 vals2
hi [1,2,3,5] [0.4,0.4,0.5, 0.1]
hagha [1,3,3,5] [0.1,0.4,0.5, 0.1]
me [4,4,3,5] [0.9,0.4,0.5, 0.1]
i know [1,4,3,5] [0.1,0.7,0.9, 0.0]
In all dataframe the str_col
column remains the same. it does not change. only the vals1
and vals2
change. vals1
and vals2
are lists. str_col
is string.
What i want in the end is a json from the list of dataframes:
hi
- list index 0:
- vals1
- vals2
- list index 1:
- vals1
- vals2
- list index 2:
- vals1
- vals2
- list index 3:
- vals1
- vals2
- list index 4:
- vals1
- vals2
hagha
- list index 0:
- vals1
- vals2
- list index 1:
- vals1
- vals2
- list index 2:
- vals1
- vals2
- list index 3:
- vals1
- vals2
- list index 4:
- vals1
- vals2
and so on
better version below: (please dont compare values of vals1 and vals2 with the example above).
{
"final_json": {
"hi": {
"0": {
"vals1": [
1,
2,
3,
5
],
"vals2": [
0.4,
0.4,
0.5,
0.1
]
},
"1": {
"vals1": [
8,
5,
5,
3
],
"vals2": [
1,
0,
0,
0
]
},
"2": {
"vals1": [
2,
3,
6,
7
],
"vals2": [
1,
0,
0,
0
]
},
"3": {
"vals1": [
3,
3,
5,
3
],
"vals2": [
1,
0,
0,
0
]
}
},
"hagha": {
"0": {
"vals1": [
6,
8,
9,
0
],
"vals2": [
0.76,
0.76,
0.36,
0.363
]
},
"1": {
"vals1": [
2,
3,
4,
6
],
"vals2": [
0.63,
0.36,
0.46,
0.6
]
},
"2": {
"vals1": [
3,
6,
5,
5
],
"vals2": [
0.4,
0.64,
0.46,
0.456
]
},
"3": {
"vals1": [
4,
6,
3,
2
],
"vals2": [
0.657,
0.675,
0.64,
0.6
]
}
.
.
.
.
.
]
}
}
}
}
Currently I'm looping over all the dataframes in the list and it is too slow. I would like to know if a faster solution to achieve this exists (may to_json
) ?
Upvotes: 2
Views: 1715
Reputation: 862511
import pandas as pd
import json
import pprint
I think you can first concat
list with key
s parameter for distinguish each DataFrame
:
df = pd.concat(dfs, keys=range(len(dfs)))
#print (df)
Then remove first level of MultiIndex
, groupby
and create dictionaries
, last convert output to dict
by to_dict
:
d = (df.reset_index(level=1, drop=True)
.groupby('str_col')['vals1','vals2']
.apply(lambda x: x.to_dict(orient='index'))
.to_dict()
)
#add start of json
d = {"final_json": d}
pprint.pprint(d)
{'final_json': {'hagha': {0: {'vals1': [1, 3, 3, 5],
'vals2': [0.1, 0.4, 0.5, 0.1]},
1: {'vals1': [10, 30, 3, 5],
'vals2': [0.17, 0.47, 0.57, 0.17]}},
'hi': {0: {'vals1': [1, 2, 3, 5],
'vals2': [0.4, 0.4, 0.5, 0.1]},
1: {'vals1': [10, 20, 30, 50],
'vals2': [0.48, 0.48, 0.58, 0.18]}},
'i know': {0: {'vals1': [1, 4, 3, 5],
'vals2': [0.1, 0.7, 0.9, 0.0]},
1: {'vals1': [1, 4, 3, 5],
'vals2': [0.1, 0.7, 0.9, 0.0]}},
'me': {0: {'vals1': [4, 4, 3, 5],
'vals2': [0.9, 0.4, 0.5, 0.1]},
1: {'vals1': [4, 4, 3, 5],
'vals2': [0.9, 0.4, 0.5, 0.1]}}}}
#convert to json
json = json.dumps(d)
Setup:
df1 = pd.DataFrame({'str_col': ['hi', 'hagha', 'me', 'i know'], 'vals1': [[1, 2, 3, 5], [1, 3, 3, 5], [4, 4, 3, 5], [1, 4, 3, 5]], 'vals2': [[0.4, 0.4, 0.5, 0.1], [0.1, 0.4, 0.5, 0.1], [0.9, 0.4, 0.5, 0.1], [0.1, 0.7, 0.9, 0.0]]})
print (df1)
# str_col vals1 vals2
#0 hi [1, 2, 3, 5] [0.4, 0.4, 0.5, 0.1]
#1 hagha [1, 3, 3, 5] [0.1, 0.4, 0.5, 0.1]
#2 me [4, 4, 3, 5] [0.9, 0.4, 0.5, 0.1]
#3 i know [1, 4, 3, 5] [0.1, 0.7, 0.9, 0.0]
df2 = pd.DataFrame({'str_col': ['hi', 'hagha', 'me', 'i know'], 'vals1': [[10, 20, 30, 50], [10, 30, 3, 5], [4, 4, 3, 5], [1, 4, 3, 5]], 'vals2': [[0.48, 0.48, 0.58, 0.18], [0.17, 0.47, 0.57, 0.17], [0.9, 0.4, 0.5, 0.1], [0.1, 0.7, 0.9, 0.0]]})
print (df2)
# str_col vals1 vals2
#0 hi [10, 20, 30, 50] [0.48, 0.48, 0.58, 0.18]
#1 hagha [10, 30, 3, 5] [0.17, 0.47, 0.57, 0.17]
#2 me [4, 4, 3, 5] [0.9, 0.4, 0.5, 0.1]
#3 i know [1, 4, 3, 5] [0.1, 0.7, 0.9, 0.0]
#create list of DataFrames
dfs = [df1, df2]
Upvotes: 3