lrh09
lrh09

Reputation: 587

Python pandas alternative to 'map' for function of 2 variables

I would like to put 2 tables into 1, namely corporate announcement date and it's price on that day (on another table)

I have 2 dataframes with the following columns

df1: date, announcement, ticker

date        ticker   announcement
25/4/2013   AAPL     Change in Boardroom
25/4/2013   GOOG     OTHERS
25/4/2013   AMZN     Change in Audit Committee

df2: date, ticker, price

date        ticker   announcement
22/3/2012   AAPL     100.00
23/3/2012   AAPL     102.30
24/3/2012   AAPL     105.40
...
...


def getPrice(dt,tk):
    try:
        return df2[(df2['date']>=dt)&(df2['ticker']==tk)].sort_values(by='date')['price'].values[0]
    except:
        return 0

prices_array = list(map(getPrice,df1['date'].values,df1['ticker'].values))

df1['price'] = prices_array

For the "map" function, the whole process takes very long time. I would love to use apply for df, but I only know how to use apply with lambda function which does not contain "if, then".

I want an extra column like this:

date        ticker   announcement               price
25/4/2013   AAPL     Change in Boardroom        124.10
25/4/2013   GOOG     OTHERS                     50.85
25/4/2013   AMZN     Change in Audit Committee  102.20

Any suggestion on quick ways to do this? Or can do this in a short amount of time?

Thank you and appreciate your help

Upvotes: 0

Views: 943

Answers (2)

Mark
Mark

Reputation: 984

I think you can use '.join'

df.set_index(['date', 'ticker']).join(df2.set_index(['date', 'ticker'])

Upvotes: 1

gilch
gilch

Reputation: 11681

You can create your function for DataFrame.apply() with def instead of lambda. Define it in advance and then put the function name where the lambda would go, same as you did for map().

The applied function gets a Series argument, so you can get the whole row.

Upvotes: 0

Related Questions