Reputation: 587
I would like to put 2 tables into 1, namely corporate announcement date and it's price on that day (on another table)
I have 2 dataframes with the following columns
df1: date, announcement, ticker
date ticker announcement
25/4/2013 AAPL Change in Boardroom
25/4/2013 GOOG OTHERS
25/4/2013 AMZN Change in Audit Committee
df2: date, ticker, price
date ticker announcement
22/3/2012 AAPL 100.00
23/3/2012 AAPL 102.30
24/3/2012 AAPL 105.40
...
...
def getPrice(dt,tk):
try:
return df2[(df2['date']>=dt)&(df2['ticker']==tk)].sort_values(by='date')['price'].values[0]
except:
return 0
prices_array = list(map(getPrice,df1['date'].values,df1['ticker'].values))
df1['price'] = prices_array
For the "map" function, the whole process takes very long time. I would love to use apply for df, but I only know how to use apply with lambda function which does not contain "if, then".
I want an extra column like this:
date ticker announcement price
25/4/2013 AAPL Change in Boardroom 124.10
25/4/2013 GOOG OTHERS 50.85
25/4/2013 AMZN Change in Audit Committee 102.20
Any suggestion on quick ways to do this? Or can do this in a short amount of time?
Thank you and appreciate your help
Upvotes: 0
Views: 943
Reputation: 984
I think you can use '.join'
df.set_index(['date', 'ticker']).join(df2.set_index(['date', 'ticker'])
Upvotes: 1
Reputation: 11681
You can create your function for DataFrame.apply()
with def
instead of lambda
. Define it in advance and then put the function name where the lambda would go, same as you did for map()
.
The applied function gets a Series
argument, so you can get the whole row.
Upvotes: 0