Reputation: 359
I have a dataframe like this
df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]
What I'm currently using
d = {}
for i in df.A:
d[i] = []
for v in df.A[df.A == i].index:
d[i].append(df.B[v])
Resulting in
{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}
But it's slow.
What is a pythonic way of doing this?
EDIT:
d = {}
for i in df.A.unique():
d[i] = df[df.A == i].B.tolist()
Still seems like there must be a faster way
Thanks for any help!
Upvotes: 8
Views: 16346
Reputation: 3682
To create a simple dictionary using two list in python you write (there are variations)
mydict = dict(zip(list1, list2)) #assumes len(list1) == len(list2)
Where zip() is a python built-in that takes one item from each list at the same position and returns a list of tuples. By casting those tuples with the dict() method you can create a dictionary, where list1 provides the dictionary keys and list2 provides the values. Thus, both list need to have the same length, since the zip method will iterate over the provided lists. You can also use izip(), which can be found in the itertools module. izip() will return an iterator instead of a list. While they are both used the same way, depending the size of the list, it is more memory efficient to use izip() because izip() will return one iterable object at a time instead of loading the entire list into memory. That being said, when you use a dictionary all it's contents are loaded into memory, so that searching for keys and values is quick. (sorry for the tangent).
Upvotes: 2
Reputation: 142216
You can use a DataFrame's groupby
and to_dict
methods which'll keep all the heavy work done in pandas, and not Python loops, eg:
import pandas as pd
df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]
d = df.groupby('A')['B'].apply(list).to_dict()
Gives you:
{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}
Upvotes: 18
Reputation: 619
look ad this: list to dictionary conversion with multiple values per key?
from collections import defaultdict
d = defaultdict(list)
for i, j in zip(df.A,df.B):
d[i].append(j)
if this ok?
EDIT: If you want, you can convert it to simple dict:
d = dict(d)
Upvotes: 3