FaCoffee
FaCoffee

Reputation: 7909

Pandas: create a dictionary with a list of columns as values

Given this DataFrame:

import pandas as pd
first=[0,1,2,3,4]
second=[10.2,5.7,7.4,17.1,86.11]
third=['a','b','c','d','e']
fourth=['z','zz','zzz','zzzz','zzzzz']
df=pd.DataFrame({'first':first,'second':second,'third':third,'fourth':fourth})
df=df[['first','second','third','fourth']]

   first  second third fourth
0      0   10.20     a      z
1      1    5.70     b     zz
2      2    7.40     c    zzz
3      3   17.10     d   zzzz
4      4   86.11     e  zzzzz

I can create a dictionary out of df using

a=df.set_index('first')['second'].to_dict()

so that I can decide what is keys and what is values. But what if you want values to be a list of columns, such as second AND third?

If I try this

b=df.set_index('first')[['second','third']].to_dict()

I get a weird dictionary of dictionaries

{'second': {0: 10.199999999999999,
  1: 5.7000000000000002,
  2: 7.4000000000000004,
  3: 17.100000000000001,
  4: 86.109999999999999},
 'third': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}}

Instead, I want a dictionary of lists

{0: [10.199999999999999,a],
 1: [5.7000000000000002,b],
 2: [7.4000000000000004,c],
 3: [17.100000000000001,d],
 4: [86.109999999999999,e]}

How to deal with this?

Upvotes: 3

Views: 11420

Answers (3)

boot-scootin
boot-scootin

Reputation: 12515

Someone else can probably chime in with a pure-pandas solution, but in a pinch I think this ought to work for you. You'd basically create the dictionary on-the-fly, indexing values in each row instead.

d = {df.loc[idx, 'first']: [df.loc[idx, 'second'], df.loc[idx, 'third']] for idx in range(df.shape[0])}

d
Out[5]: 
{0: [10.199999999999999, 'a'],
 1: [5.7000000000000002, 'b'],
 2: [7.4000000000000004, 'c'],
 3: [17.100000000000001, 'd'],
 4: [86.109999999999999, 'e']}

Edit: You could also do this:

df['new'] = list(zip(df['second'], df['third']))

df
Out[25]: 
   first  second third fourth         new
0      0   10.20     a      z   (10.2, a)
1      1    5.70     b     zz    (5.7, b)
2      2    7.40     c    zzz    (7.4, c)
3      3   17.10     d   zzzz   (17.1, d)
4      4   86.11     e  zzzzz  (86.11, e)

df = df[['first', 'new']]

df
Out[27]: 
   first         new
0      0   (10.2, a)
1      1    (5.7, b)
2      2    (7.4, c)
3      3   (17.1, d)
4      4  (86.11, e)

df.set_index('first').to_dict()
Out[28]: 
{'new': {0: (10.199999999999999, 'a'),
  1: (5.7000000000000002, 'b'),
  2: (7.4000000000000004, 'c'),
  3: (17.100000000000001, 'd'),
  4: (86.109999999999999, 'e')}}

In this approach, you would first create the list (or tuple), you want to keep and then "drop" the other columns. This is basically your original approach, modified.

And if you really wanted lists instead of tuples, just map the list type onto that 'new' column:

df['new'] = list(map(list, zip(df['second'], df['third'])))

Upvotes: 3

EdChum
EdChum

Reputation: 393963

You can zip the values:

In [118]:
b=df.set_index('first')[['second','third']].values.tolist()
dict(zip(df['first'].index,b))

Out[118]:
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}

Upvotes: 1

jezrael
jezrael

Reputation: 862511

You can create numpy array by values, zip by column first and convert to dict:

a = dict(zip(df['first'], df[['second','third']].values.tolist()))
print (a)
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}

Upvotes: 1

Related Questions