Reputation: 1073
I have a dict of lists of dicts. What is the most efficient way to convert this into a DataFrame
in pandas?
data = {
"0a2":[{"a":1,"b":1},{"a":1,"b":1,"c":1},{"a":1,"b":1}],
"279":[{"a":1,"b":1,"c":1},{"a":1,"b":1,"d":1}],
"ae2":[{"a":1,"b":1},{"a":1,"d":1},{"a":1,"b":1},{"a":1,"d":1}],
#...
}
import pandas as pd
pd.DataFrame(data, columns=["a","b","c","d"])
What I've tried:
One solution is to denormalize the data like this, by duplicating the "id" keys:
bad_data = [
{"a":1,"b":1,"id":"0a2"},{"a":1,"b":1,"c":1,"id":"0a2"},{"a":1,"b":1,"id":"0a2"},
{"a":1,"b":1,"c":1,"id":"279"},{"a":1,"b":1,"d":1,"id":"279"},
{"a":1,"b":1,"id":"ae2"},{"a":1,"d":1,"id":"ae2"},{"a":1,"b":1,"id":"ae2"},{"a":1,"d":1,"id":"ae2"}
]
pd.DataFrame(bad_data, columns=["a","b","c","d","id"])
But my data is very large, so I'd prefer some other hierarchical index solution.
Upvotes: 2
Views: 56
Reputation: 150745
IIUC, you can do (remcomended)
new_df = pd.concat((pd.DataFrame(d) for d in data.values()), keys=data.keys())
Output:
a b c d
0a2 0 1 1.0 NaN NaN
1 1 1.0 1.0 NaN
2 1 1.0 NaN NaN
279 0 1 1.0 1.0 NaN
1 1 1.0 NaN 1.0
ae2 0 1 1.0 NaN NaN
1 1 NaN NaN 1.0
2 1 1.0 NaN NaN
3 1 NaN NaN 1.0
Or
pd.concat(pd.DataFrame(v).assign(ID=k) for k,v in data.items())
Output:
a b c ID d
0 1 1.0 NaN 0a2 NaN
1 1 1.0 1.0 0a2 NaN
2 1 1.0 NaN 0a2 NaN
0 1 1.0 1.0 279 NaN
1 1 1.0 NaN 279 1.0
0 1 1.0 NaN ae2 NaN
1 1 NaN NaN ae2 1.0
2 1 1.0 NaN ae2 NaN
3 1 NaN NaN ae2 1.0
Upvotes: 2