julia
julia

Reputation: 152

Add items to a dictionary of lists

Suppose the following toyset (from a CSV file where column names are the "keys" and I'm only interested in some rows that I put in "data"):

keys = ['k1', 'k2', 'k3', 'k4']
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]

I want to get a dictionary with a list for each column, like this:

{'k1': [1, 5, 9, 13], 'k2': [2, 6, 10, 14], 'k3': [3, 7, 11, 15], 'k4': [4, 8, 
12, 16]}

In my code I first initialize the dictionary with empty lists and then iterate (in the order of the keys) to append each item in their list.

my_dict = dict.fromkeys(keys, [])
for row in data:
    for i, k in zip(row, keys):
        my_dict[k].append(i)

But it doesn't work. It builds this dictionary:

{'k3': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], 'k2': [1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], 'k1': [1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16], 'k4': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16]}

You can see that all the elements are in all lists instead of just four elements in each list. If I print i, k in the loop it does the correct pairs of items and keys. So I guess the problem is when I add item i in the list for key k.

Does anyone know why all elements are added to all lists and what would be the right way of building my dictionary?

Thanks in advance

Upvotes: 5

Views: 7696

Answers (5)

mgilson
mgilson

Reputation: 309861

I think that dict(zip(keys, map(list,zip(*data)) )) should do the trick.

First, I transpose your data (zip(*data)), but that returns tuples... since you want lists, I use map to construct lists from the tuples. Then we use zip again to match keys with items in the list. e.g. (key1,list1), (key2,list2),.... This is exactly what the dictionary constructor expects, so you're golden.

An alternate solution would be to use a collections.defaultdict:

d=collections.defaultdict(list)
tdata=zip(*data)  #transpose your data
for k,v in zip(keys,tdata):
    d[k].extend(v)

Of course, this leaves you with a defaultdict instead of a regular one though it could be changed to a regular one trivially: d=dict(**d).

Upvotes: 4

Igor Chubin
Igor Chubin

Reputation: 64563

zip it but transpose it first:

>>> keys = ['k1', 'k2', 'k3', 'k4']
>>> data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
>>> print dict(zip(keys, zip(*data)))
{'k3': (3, 7, 11, 15), 'k2': (2, 6, 10, 14), 'k1': (1, 5, 9, 13), 'k4': (4, 8, 12, 16)}

If you want lists not tuples in the array:

>>> print dict(zip(keys, [list(i) for i in zip(*data)]))

And if you want to use your version, just make dictionary comprehension, not fromkeys:

my_dict = { k : [] for k in keys }

The problem in your case that you initialize my_dict with the same value:

>>> my_dict = dict.fromkeys(keys, [])
>>> my_dict
{'k3': [], 'k2': [], 'k1': [], 'k4': []}
>>> my_dict['k3'].append(1)
>>> my_dict
{'k3': [1], 'k2': [1], 'k1': [1], 'k4': [1]}

When you do it right (with dictionary/list comprehension):

>>> my_dict = dict((k, []) for k in keys )
>>> my_dict
{'k3': [], 'k2': [], 'k1': [], 'k4': []}
>>> my_dict['k3'].append(1)
>>> my_dict
{'k3': [1], 'k2': [], 'k1': [], 'k4': []}

Upvotes: 9

jamylak
jamylak

Reputation: 133524

>>> keys = ['k1', 'k2', 'k3', 'k4']
>>> data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
>>> dict(zip(keys, zip(*data)))
{'k3': (3, 7, 11, 15), 'k2': (2, 6, 10, 14), 'k1': (1, 5, 9, 13), 'k4': (4, 8, 12, 16)}

If you really need lists:

>>> dict(zip(keys, map(list, zip(*data))))
{'k3': [3, 7, 11, 15], 'k2': [2, 6, 10, 14], 'k1': [1, 5, 9, 13], 'k4': [4, 8, 12, 16]}

If you are using python 2, zip and map return lists. If you are working with a large data set you can use itertools.izip and itertools.imap to be more efficient and avoid creating the intermediary lists.

Upvotes: 0

steffen
steffen

Reputation: 8958

That should work:

keys = ['k1', 'k2', 'k3', 'k4']
data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
mydict = {}
for k in keys:
    b[k] = []
    for l in data:
        b[k].append(l[i])
    i += 1

Note that index() is an expensive function. Do not use it when you have a huge data set. increment a variable in that case.

edit: no it doesn't! sorry, just a moment

edit: now it works!

Upvotes: 0

Sven Marnach
Sven Marnach

Reputation: 601519

You are running into the issue explained in this answer: You dictionary is initialised with the same list object resued for all values. Simply use

dict(zip(keys, zip(*data)))

instead. This will transpose the list of rows into a list of columns, and then zip the keys and columns together.

Upvotes: 7

Related Questions