Reputation: 137
I've a data structure which is dynamically populated, so number of keys and sub-keys are unknown. I want to convert it into a Pandas df. The structure looks like this
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8] }
}
A dict of dicts with lists for value
I want to convert it into pandas df like this
index users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
....
I've tried pd.DataFrame(dict), from_dict method but couldn't get it to work. Any help would be appreciated.
EDIT: Sorry about the syntax error, fixed
Upvotes: 2
Views: 1340
Reputation: 294278
Option 1
pd.DataFrame.from_dict(datastore, 'index').stack() \
.rename_axis(['users', 'times']) \
.apply(pd.Series, index=list('xyzk')).reset_index()
users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
Option 2
pd.DataFrame(
[[u, t] + l for u, td in datastore.items() for t, l in td.items()],
columns='users times x y z k'.split()
)
users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
Timing
%timeit pd.DataFrame.from_dict(datastore, 'index').stack().rename_axis(['users', 'times']).apply(pd.Series, index=list('xyzk')).reset_index()
%timeit pd.DataFrame([[u, t] + l for u, td in datastore.items() for t, l in td.items()], columns='users timets x y z k'.split())
100 loops, best of 3: 2.72 ms per loop
1000 loops, best of 3: 556 µs per loop
DEBUG
If you copy and paste this code... it should run. Please try it and report back that it did run.
import pandas as pd
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8]}
}
pd.DataFrame.from_dict(datastore, 'index').stack() \
.rename_axis(['users', 'times']) \
.apply(pd.Series, index=list('xyzk')).reset_index()
Upvotes: 4
Reputation: 8493
Here's an approach
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8]}
}
We can use pd.DataFrame() with the dict then stack() it then reset_index() it
df = pd.DataFrame(datastore).stack().reset_index()
print(df)
level_0 level_1 0
0 time1 user1 [1, 2, 3, 4]
1 time1 user2 [1, 2, 3, 4]
2 time2 user1 [5, 6, 7, 8]
3 time2 user2 [5, 6, 7, 8]
4 time3 user1 [1, 2, 3, 4]
Now we 'split' the list in 0 with an apply of pd.Series and then join that back to level_1 and level_2. Some column renaming and we're done
df = df[['level_1', 'level_0']].join(df[0].apply(pd.Series))
df.columns = ['users', 'times', 'x', 'y', 'z', 'k']
print(df)
users times x y z k
0 user1 time1 1 2 3 4
1 user2 time1 1 2 3 4
2 user1 time2 5 6 7 8
3 user2 time2 5 6 7 8
4 user1 time3 1 2 3 4
Upvotes: 2