Reputation: 3125
I have a dict where the values are is a list, for example;
my_dict = {1: [964725688, 6928857],
...
22: [1667906, 35207807, 685530997, 35207807],
...
}
In this example, the max items in a list is 4, but it could be greater than that.
I would like to convert it to a dataframe like:
1 964725688
1 6928857
...
22 1667906
22 35207807
22 685530997
22 35207807
Upvotes: 6
Views: 2415
Reputation: 19947
#Load dict directly to a Dataframe without loops
df=pd.DataFrame.from_dict(my_dict,orient='index')
#Unstack, drop na and sort if you need.
df.unstack().dropna().sort_index(level=1)
Out[382]:
0 1 964725688.0
1 1 6928857.0
0 22 1667906.0
1 22 35207807.0
2 22 685530997.0
3 22 35207807.0
dtype: float64
Upvotes: 1
Reputation: 13274
Slightly on the functional side using zip
and reduce
:
from functools import reduce # if working with Python3
import pandas as pd
d = {1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}
df = pd.DataFrame(reduce(lambda x,y: x+y, [list(zip([k]*len(v), v)) for k,v in d.items()]))
print(df)
# 0 1
# 0 1 964725688
# 1 1 6928857
# 2 22 1667906
# 3 22 35207807
# 4 22 685530997
# 5 22 35207807
We zip
the keys and the values to create records (extended through a reduce
operation). The records are then passed to the pd.DataFrame
function.
I hope this helps.
Upvotes: 1
Reputation: 294258
First Idea
pandas
s = pd.Series(my_dict)
pd.Series(
np.concatenate(s.values),
s.index.repeat(s.str.len())
)
1 964725688
1 6928857
22 1667906
22 35207807
22 685530997
22 35207807
dtype: int64
Faster!
numpy
values = list(my_dict.values())
lens = [len(value) for value in values]
keys = list(my_dict.keys())
pd.Series(np.concatenate(values), np.repeat(keys, lens))
1 964725688
1 6928857
22 1667906
22 35207807
22 685530997
22 35207807
dtype: int64
Interesting
pd.concat
pd.concat({k: pd.Series(v) for k, v in my_dict.items()}).reset_index(1, drop=True)
1 964725688
1 6928857
22 1667906
22 35207807
22 685530997
22 35207807
dtype: int64
Upvotes: 2
Reputation: 6111
my_dict ={1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}
df = pd.DataFrame( [ [k,ele] for k,v in my_dict.iteritems() for ele in v ])
print df
0 1
0 1 964725688
1 1 6928857
2 22 1667906
3 22 35207807
4 22 685530997
5 22 35207807
Upvotes: 3