spiff
spiff

Reputation: 1495

Python: Column and Row operations without using loop

I have a the below df1:

  Date        Tickers Qty
01-01-2018    ABC    25
02-01-2018    BCD    25
02-01-2018    XYZ    31
05-01-2018    XYZ    25

and another df2 as below

  Date         ABC  BCD  XYZ
01-01-2018    123   5   78
02-01-2018    125   7   79
03-01-2018    127   6   81
04-01-2018    126   7   82
05-01-2018    124   6   83

I want a resultant column in df1 which is the product of the correct column and row in df2 - getting the right ticker's rate on the given date and let the other dates have nan within df1

  Date       df1['Product']
01-01-2018      3075
02-01-2018      175
02-01-2018      2449
03-01-2018      nan
04-01-2018      nan
05-01-2018      2075

This seems like standard python operation, but I just am unable to achieve this without writing a loop - which is taking a very long time to execute:

I merged the above 2 tables on Date and then ran the below loop

for i in range(len(df1)):
    try:
        df1['Product'][i] = df1[df1['Ticker'][i]][i]
    except ValueError:
        df['Product'][i] = np.nan

Is there any better pythonic way of achieving this and not writing this loop pls?

Upvotes: 4

Views: 838

Answers (2)

jezrael
jezrael

Reputation: 863301

Use:

df11 = df1.pivot('Date', 'Tickers','Qty')
df22 = df2.set_index('Date')

s = df22.mul(df11).bfill(axis=1).iloc[:, 0]
print (s)
Date
01-01-2018     3075.0
02-01-2018      175.0
03-01-2018        NaN
04-01-2018        NaN
05-01-2018     2075.0
Name: ABC, dtype: float64

Solution for add new column to df1:

df11 = df1.pivot('Date', 'Tickers','Qty')
df22 = df2.set_index('Date')

df = df1.join(df22.mul(df11).stack().rename('new'), on=['Date','Tickers'], how='left')
print (df)
         Date Tickers  Qty     new
0  01-01-2018     ABC   25  3075.0
1  02-01-2018     BCD   25   175.0
2  05-01-2018     XYZ   25  2075.0

EDIT:

If pairs Dates with Tickers are duplicated, solution above is not possible use.

print (df1)
         Date Tickers  Qty
0  01-01-2018     ABC   25
1  01-01-2018     ABC   20 <-added duplicated pairs 01-01-2018 and ABC
2  02-01-2018     XYZ   31
3  02-01-2018     BCD   25
4  05-01-2018     XYZ   25

df3 = df1[['Date']].copy()
#add new values to column
df3['new'] = df2.set_index('Date').lookup(df1['Date'], df1['Tickers']) * df1['Qty']
#add missing values to duplicated Dates 
df3 = df2[['Date']].drop_duplicates().merge(df3, how='left') 
print (df3)
         Date     new
0  01-01-2018  3075.0
1  01-01-2018  2460.0
2  02-01-2018  2449.0
3  02-01-2018   175.0
4  03-01-2018     NaN
5  04-01-2018     NaN
6  05-01-2018  2075.0

Upvotes: 9

Pyd
Pyd

Reputation: 6159

you need to set 'Date' as index and multiply,

df1=df1.set_index('Date')
df2=df2.set_index('Date')


df3=(df2['ABC']*df1['Qty']).reset_index() 
print(df3)
    Date        0
0   01-01-2018  3075.0
1   02-01-2018  3125.0
2   03-01-2018  NaN
3   04-01-2018  NaN
4   05-01-2018  3100.0

Upvotes: 3

Related Questions