Ajax
Ajax

Reputation: 1749

Pandas divide entries of a column by entries from another data frame

I have 2 dataframes - A and B. A contains weekly sales data for various stores, departments indexed by a key Store_Dept_Date (eg. 2_12_2010-04-03 )while B contains corresponding Consumer Price Index (CPI) for given store and date indexed as Store_Date for e.g. 2_2010-04-03 .

> A.columns  
> Out [ ] : Index([u'Store', u'Dept', u'Date', u'Weekly_Sales'], dtype='object')

> B.columns  
> Out [ ] : Index([u'Store', u'Date', u'CPI'], dtype='object')

I want to normalize the weekly sales given in A by dividing each row of A by corresponding CPI value given in B.

Currently I am trying this:

for ix,row in A.iterrows():
  f_index = str(row['Store']) + "_" + row['Date']
  A.ix[ix,'Weekly_Sales'] = row['Weekly_Sales']/ B.ix[f_index,'CPI']

A contains 421570 rows. My program takes forever to run. Whats the correct and efficient way of doing it?

Upvotes: 0

Views: 546

Answers (1)

slushy
slushy

Reputation: 3357

The DataFrames' merge method should be faster even though it copies data. You can set the flag copy=False to minimize unnecessary copying.

If there is one date in B for every date in A, then you can do:

C = A.merge(B, on=['Store', 'Date'], copy=False)
C['Normalized_Sales'] = C.Weekly_Sales / C.CPI

Upvotes: 1

Related Questions