Reputation: 119

Pandas transform two columns of lists into a columns dictionary with repeated keys

I have a pandas dataframe called: self.data They have two columns: name and value and I want a new one to be generated with a dictionary. For example:

Name	Value	New Dict Column
[a, b, c, a]	[1, 2, 3, 4]	{a: [1, 4], b: [2], c: [3]}
[b, b, a]	[1, 2, 3]	{b: [1, 2], a: [3] }

At this moment I have the following code:

data['dict'] = self.data[['name', 'value']].apply(lambda x: dict(zip(*x)), axis=1)

The problem with this attempt is that the pair name, value is being always replaced. Using the example, I can't save both a1 and a2. The final dictionary only stores the last one.

Thank you in advance!

Upvotes: 1

Answers (3)

jezrael

Reputation: 862521

Use custom function with defaultdict if performance is important:

from collections import defaultdict

def f(x):
    d = defaultdict(list)
    for y, z in zip(*x):
        d[y].append(z)
    return d

df['New Dict Column'] = [ f(x) for x in df[['column1','column2']].to_numpy()]
print(df)
        column1       column2                    New Dict Column
0  [a, b, c, a]  [1, 2, 3, 4]  {'a': [1, 4], 'b': [2], 'c': [3]}
1     [b, b, a]     [1, 2, 3]            {'b': [1, 2], 'a': [3]}

Performance is really good, 10 times faster:

#20k rows for test
df = pd.concat([df] * 10000, ignore_index=True)


In [211]: %timeit df.apply(lambda data: {k: [y for x, y in zip(data[0], data[1]) if x == k] for k in data[0]}, axis=1)
532 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [212]: %timeit  [ f(x) for x in df[['column1','column2']].to_numpy()]
53.8 ms ± 596 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

U13-Forward

Reputation: 71570

Try something like this with apply:

df['New Dict Column'] = df.apply(lambda data: {k: [y for x, y in zip(data[0], data[1]) if x == k] for k in data[0]}, axis=1)
print(df)

Output:

           Name         Value                    New Dict Column
0  [a, b, c, a]  [1, 2, 3, 4]  {'a': [1, 4], 'b': [2], 'c': [3]}
1     [b, b, a]     [1, 2, 3]            {'b': [1, 2], 'a': [3]}

Upvotes: 3

Antjes

Reputation: 120

You can use apply for several columns with the next pattern:

import pandas as pd
df = pd.DataFrame({'Name' :[['a', 'b', 'c', 'a'], ['b', 'b', 'a']], 
                   'Value' :[['a1', 'b1', 'c1', 'a2'], ['b1', 'b2', 'a1']]})
                   
print(df)

def get_dict(row):
    my_dict = {}
    for x in row['Name']:
        my_dict[x] = row['Value']
    return my_dict
    

df['my_dict'] = df.apply(get_dict, axis=1)

print(df)

PS: take into account that I have not define correctly the way to extract the right elements from Value to be mapped to the right element of Name. You will need to implement that part of the code.

Upvotes: 0

Pandas transform two columns of lists into a columns dictionary with repeated keys

Answers (3)

Related Questions