user1566200
user1566200

Reputation: 1838

Vectorizing a multiplication and dict mapping on a Pandas DataFrame without iterating?

I have a Pandas DataFrame, df:

import pandas as pd
import numpy as np
import math

df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})

and a dict, mask:

mask = {1:32,2:64,3:100,4:200}

I want my end result to be a DataFrame like this:

A    B    C
1    1    32
2    2    64
2    3    96
4    4    400
nan  nan  nan

Right now I am doing this, which seems innefficient:

for idx, row in df.iterrows():
    if not math.isnan(row['A']):
        if row['A'] != 1:
            df.loc[idx, 'C'] = row['B'] * mask[row['A'] - 1]
        else:
            df.loc[idx, 'C'] = row['B'] * mask[row['A']]

Is there an easy way to vectorize this?

Upvotes: 2

Views: 774

Answers (2)

piRSquared
piRSquared

Reputation: 294498

This should work:

df['C'] = df.B * (df.A - (df.A != 1)).map(mask)

enter image description here


Timing

10,000 rows

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(2000)])

enter image description here

100,000 rows

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(20000)])

enter image description here

Upvotes: 3

akuiper
akuiper

Reputation: 215067

Here is an option using apply, and the get method for dictionary which returns None if the key is not in the dictionary:

df['C'] = df.apply(lambda r: mask.get(r.A) if r.A == 1 else mask.get(r.A - 1), axis = 1) * df.B

df    
#   A   B   C
#0  1   1   32
#1  2   2   64
#2  2   3   96
#3  4   4   400
#4  NaN 5   NaN

Upvotes: 3

Related Questions