mike.depetriconi
mike.depetriconi

Reputation: 69

Set indices of dataframe as a single key in dictionary

I have a dataframe such as:

df = {'index': [0, 0, 0, 0, 0, 1,1,1,1,1, 2,2,2,2], 'value': ['val1', 'val2', 'val3', 'val4', 'val5', 'val6','val7','val8','val9','val10', 'val11','val12','val13','val14']}

I'd like to get a dictionary where each index would become a key in my dictionary, so that: key=0 and values = ['val1', 'val2', 'val3', 'val4', 'val5']

Any idea how to do it? I've been using the 'to_dict' but it looks like I doesn't do what i need to do.

Upvotes: 1

Views: 56

Answers (3)

BENY
BENY

Reputation: 323236

Using itertools

import itertools
l=df.sort_values('index').values.tolist()
d={k: [x[1] for x in g] for k, g in itertools.groupby(l,lambda x : x[0])}
d
{0: ['val1', 'val2', 'val3', 'val4', 'val5'], 1: ['val6', 'val7', 'val8', 'val9', 'val10'], 2: ['val11', 'val12', 'val13', 'val14']}

Upvotes: 1

cs95
cs95

Reputation: 402513

Use groupby and apply, followed by a final to_dict call.

df.groupby('index').value.apply(list).to_dict()
# {0: ['val1', 'val2', 'val3', 'val4', 'val5'],
#  1: ['val6', 'val7', 'val8', 'val9', 'val10'],
#  2: ['val11', 'val12', 'val13', 'val14']}

Another option is to iterate over your rows and append to values in a dictionary using setdefault.

d = {}
for k, v in zip(df['index'], df.value):
    d.setdefault(k, []).append(v)

print(d)
# {0: ['val1', 'val2', 'val3', 'val4', 'val5'],
#  1: ['val6', 'val7', 'val8', 'val9', 'val10'],
#  2: ['val11', 'val12', 'val13', 'val14']}

My tests indicate that this is actually performant than groupby for moderately sized frames. This will also preserve value ordering, while groupby performs a sort (whether it is stable or not is an implementation detail).

Upvotes: 2

Andrii Zarubin
Andrii Zarubin

Reputation: 2255

I can think of something like:

import pandas as pd
df = pd.DataFrame({'index': [0, 0, 0, 0, 0, 1,1,1,1,1, 2,2,2,2], 'value': ['val1', 'val2', 'val3', 'val4', 'val5', 'val6','val7','val8','val9','val10', 'val11','val12','val13','val14']})
df.groupby(by='index').apply(lambda x: list(x['value'])).to_dict()

Output is:

{0: ['val1', 'val2', 'val3', 'val4', 'val5'],
 1: ['val6', 'val7', 'val8', 'val9', 'val10'],
 2: ['val11', 'val12', 'val13', 'val14']}

Upvotes: 2

Related Questions