Arnold Klein
Arnold Klein

Reputation: 3086

Map columns in pandas using a dictionary

I need to create a new column, which will contain letters {a,b,c,d} based on rules:

{'a' if (df['q1']==0 & df['q2']==0), 
 'b' if (df['q1']==0 & df['q2']==1),
 'c' if (df['q1']==1 & df['q2']==0),
 'd' if (df['q1']==1 & df['q2']==1)}  

so, the new third column should contain a letter which corresponds to a particular combination of {0,1} in two columns.

         q1      q2
0         0       1
1         0       1
2         0       1
3         0       1
4         0       1
5         0       1
6         0       1
7         0       1
8         0       1
9         0       1
10        1       1
11        1       1
12        0       1
13        0       1
14        1       0
15        0       0
16        0       0
17        0       0
18        0       0
19        0       0
20        0       0
21        0       0

I thought about converting numbers in each row from binary to decimal format and then apply dictionary rules.

Upvotes: 2

Views: 854

Answers (2)

cs95
cs95

Reputation: 403198

Another method with df.map and df.transform:

In [90]: mapping = {(0, 0) :'a', (0, 1) : 'b', (1, 0): 'c', (1, 1): 'd'}

In [91]: df['val'] = df.transform(lambda x: (x['q1'], x['q2']), axis=1).map(mapping); df
Out[91]: 
    q1  q2 val
0    0   1   b
1    0   1   b
2    0   1   b
3    0   1   b
4    0   1   b
5    0   1   b
6    0   1   b
7    0   1   b
8    0   1   b
9    0   1   b
10   1   1   d
11   1   1   d
12   0   1   b
13   0   1   b
14   1   0   c
15   0   0   a
16   0   0   a
17   0   0   a
18   0   0   a
19   0   0   a
20   0   0   a
21   0   0   a

You can also use zip to generate columns, apply pd.Series and then do the mapping:

In [119]: df['val'] = pd.Series(list(zip(df.q1, df.q2))).map(mapping); df
Out[119]: 
    q1  q2 val
0    0   1   b
1    0   1   b
2    0   1   b
3    0   1   b
4    0   1   b
5    0   1   b
6    0   1   b
7    0   1   b
8    0   1   b
9    0   1   b
10   1   1   d
11   1   1   d
12   0   1   b
13   0   1   b
14   1   0   c
15   0   0   a
16   0   0   a
17   0   0   a
18   0   0   a
19   0   0   a
20   0   0   a
21   0   0   a

Performance

jezrael's solution:

In [552]: %%timeit
     ...: idx = pd.MultiIndex.from_product([[0,1],[0,1]], names=('q1','q2'))
     ...: s = pd.Series(['a','b','c','d'], index=idx, name='val')
     ...: df.join(s, on=['q1','q2'])
     ...: 
100 loops, best of 3: 2.84 ms per loop

Proposed in this post:

In [553]: %%timeit
     ...: mapping = {(0, 0) :'a', (0, 1) : 'b', (1, 0): 'c', (1, 1): 'd'}
     ...: df.transform(lambda x: (x['q1'], x['q2']), axis=1).map(mapping)
     ...:  
1000 loops, best of 3: 1.7 ms per loop

Upvotes: 1

jezrael
jezrael

Reputation: 863651

You can use join by Series with MultiIndex:

idx = pd.MultiIndex.from_product([[0,1],[0,1]], names=('q1','q2'))
s = pd.Series(['a','b','c','d'], index=idx, name='val')
print (s)
q1  q2
0   0     a
    1     b
1   0     c
    1     d
Name: val, dtype: object

df = df.join(s, on=['q1','q2'])
print (df)
    q1  q2 val
0    0   1   b
1    0   1   b
2    0   1   b
3    0   1   b
4    0   1   b
5    0   1   b
6    0   1   b
7    0   1   b
8    0   1   b
9    0   1   b
10   1   1   d
11   1   1   d
12   0   1   b
13   0   1   b
14   1   0   c
15   0   0   a
16   0   0   a
17   0   0   a
18   0   0   a
19   0   0   a
20   0   0   a
21   0   0   a

Upvotes: 3

Related Questions