Reputation:
I am new to Python and I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.
My list of nested dictionaries is the following:
{'category_1': [{'a': '151',
'b': '116',
'c': '86'}],
'category_2': [{'d': '201',
'e': '211',
'f': '252'},
{'d': '-1',
'e': '-9',
'f': '-7'}],
'category_3': {'g': 'Valid',
'h': None,
'i': False,
'j': False},
'category_4': {'k': None,
'l': None,
'm': None,
'n': None}}
And my output should be
a b c d e f g h i j k l m n
0 151 116 86 201,-1 211,-9 252,-7 valid None False False None None None None
What i tried, I'm able to do category 1,3,4 but couldn't figure out the 2nd category I tried concat and for nested loop to get it
ex=pd.concat([pd.Series(d) for d in (eg1)], axis=1).T
Then mergiting it. As i said, couldn't figure out in the whole!
Upvotes: 1
Views: 919
Reputation: 1261
I wrote a short recursive function that returns a series, or a concatenation of several series if one of the keys in your dict (e.g category_2
) contains a list of multiple dicts.
def expand(x):
if type(x) == dict:
return pd.Series(x)
elif type(x) == list:
return pd.concat([expand(i) for i in x])
If I start with the dictionary that you pasted in in your example above:
d = {'category_1': [{'a': '151',
'b': '116',
'c': '86'}],
'category_2': [{'d': '201',
'e': '211',
'f': '252'},
{'d': '-1',
'e': '-9',
'f': '-7'}],
'category_3': {'g': 'Valid',
'h': None,
'i': False,
'j': False},
'category_4': {'k': None,
'l': None,
'm': None,
'n': None}}
Then it's just a matter of concatenating all the series created by the recursive method I wrote:
output = pd.concat([expand(value) for key, value in d.items()])
And merging any duplicate indices so that their items appear in one row and are separated by commas. I also reshape the series into a df with one row and several columns:
output = pd.DataFrame(output.groupby(output.index).apply(lambda x: ','.join(x.astype(str)))).T
This results in a dataframe that matches your desired output:
output
a b c d e f g h i j k l m n
0 151 116 86 201,-1 211,-9 252,-7 Valid None Invalid Invalid None None None None
Upvotes: 2
Reputation: 2019
The code below recursively tries to flatten the input structure that can have lists or other dicts. When it hit the leafs, adds the content to a flattened dict and then convert it to a dataframe.
flattened_dict = {}
def flatten(obj, name = ''):
if isinstance(obj, dict):
for key, value in obj.items():
flatten(obj[key], key)
elif isinstance(obj, list):
for e in obj:
flatten(e)
else:
if obj == 'null':
obj = None
flattened_dict[name] = [obj]
flatten(eg1)
Please note that you have to define the null as a string. The definition for the original dict is:
eg1 = {
"my_list": {
"category_1": [
{
"a": "151",
"b": "116",
"c": "86"
}
],
"category_2": [
{
"d": "201",
"e": "211",
"f": "252"
},
{
"d": "-1 ",
"e": "-9",
"f": "-7"
}
],
"category_3": {
"g": "Valid",
"h": "null",
"i": "Invalid",
"j": "Invalid"
},
"category_4": {
"k": "null",
"l": "null",
"m": "null",
"n": "null"
}
}
}
Upvotes: 1