Reputation: 359

How to create a dictionary of lists from two columns in a dataframe

I have a dataframe like this

df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]

What I'm currently using

d = {}
for i in df.A:
    d[i] = []
    for v in df.A[df.A == i].index:
        d[i].append(df.B[v])

Resulting in

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}

But it's slow.

What is a pythonic way of doing this?

EDIT:

d = {}
for i in df.A.unique():
    d[i] = df[df.A == i].B.tolist()

Still seems like there must be a faster way

Thanks for any help!

Upvotes: 8

Answers (3)

reticentroot

Reputation: 3682

To create a simple dictionary using two list in python you write (there are variations)

mydict = dict(zip(list1, list2)) #assumes len(list1) ==  len(list2)

Where zip() is a python built-in that takes one item from each list at the same position and returns a list of tuples. By casting those tuples with the dict() method you can create a dictionary, where list1 provides the dictionary keys and list2 provides the values. Thus, both list need to have the same length, since the zip method will iterate over the provided lists. You can also use izip(), which can be found in the itertools module. izip() will return an iterator instead of a list. While they are both used the same way, depending the size of the list, it is more memory efficient to use izip() because izip() will return one iterable object at a time instead of loading the entire list into memory. That being said, when you use a dictionary all it's contents are loaded into memory, so that searching for keys and values is quick. (sorry for the tangent).

Upvotes: 2

Jon Clements

Reputation: 142216

You can use a DataFrame's groupby and to_dict methods which'll keep all the heavy work done in pandas, and not Python loops, eg:

import pandas as pd

df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]

d = df.groupby('A')['B'].apply(list).to_dict()

Gives you:

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}

Upvotes: 18

Marcin Fabrykowski

Reputation: 619

look ad this: list to dictionary conversion with multiple values per key?

from collections import defaultdict
d = defaultdict(list)
for i, j in zip(df.A,df.B):
    d[i].append(j)

if this ok?

EDIT: If you want, you can convert it to simple dict:

d = dict(d)

Upvotes: 3

How to create a dictionary of lists from two columns in a dataframe

Answers (3)

Related Questions