Reputation: 1749
I have 2 dataframes - A and B. A contains weekly sales data for various stores, departments indexed by a key Store_Dept_Date
(eg. 2_12_2010-04-03
)while B contains corresponding Consumer Price Index (CPI) for given store and date indexed as Store_Date
for e.g. 2_2010-04-03
.
> A.columns
> Out [ ] : Index([u'Store', u'Dept', u'Date', u'Weekly_Sales'], dtype='object')
> B.columns
> Out [ ] : Index([u'Store', u'Date', u'CPI'], dtype='object')
I want to normalize the weekly sales given in A by dividing each row of A by corresponding CPI value given in B.
Currently I am trying this:
for ix,row in A.iterrows():
f_index = str(row['Store']) + "_" + row['Date']
A.ix[ix,'Weekly_Sales'] = row['Weekly_Sales']/ B.ix[f_index,'CPI']
A contains 421570 rows. My program takes forever to run. Whats the correct and efficient way of doing it?
Upvotes: 0
Views: 546
Reputation: 3357
The DataFrames' merge method should be faster even though it copies data. You can set the flag copy=False
to minimize unnecessary copying.
If there is one date in B for every date in A, then you can do:
C = A.merge(B, on=['Store', 'Date'], copy=False)
C['Normalized_Sales'] = C.Weekly_Sales / C.CPI
Upvotes: 1