Reputation: 1193
I have two dataframes, one with news and the other with stock price. Both the dataframes have a "Date" column. I want to merge them on a gap of 5 days.
Lets say my news dataframe is df1 and the other price dataframe as df2.
My df1 looks like this:
News_Dates News
2018-09-29 Huge blow to ABC Corp. as they lost the 2012 tax case
2018-09-30 ABC Corp. suffers a loss
2018-10-01 ABC Corp to Sell stakes
2018-12-20 We are going to comeback strong said ABC CEO
2018-12-22 Shares are down massively for ABC Corp.
My df2 looks like this:
Dates Price
2018-10-04 120
2018-12-24 131
First method of merging I do is:
pd.merge_asof(df1_zscore.sort_values(by=['Dates']), df_n.sort_values(by=['News_Dates']), left_on=['Dates'], right_on=['News_Dates'] \
tolerance=pd.Timedelta('5d'), direction='backward')
The resulting df is:
Dates News_Dates News Price
2018-10-04 2018-10-01 ABC Corp to Sell stakes 120
2018-12-24 2018-12-22 Shares are down massively for ABC Corp. 131
The second way of merging I do is:
pd.merge_asof(df_n.sort_values(by=['Dates']), df1_zscore.sort_values(by=['Dates']), left_on=['News_Dates'], right_no=['Dates'] \
tolerance=pd.Timedelta('5d'), direction='forward').dropna()
And the resulting df as:
News_Dates News Dates Price
2018-09-29 Huge blow to ABC Corp. as they lost the 2012 tax case 2018-10-04 120
2018-09-30 ABC Corp. suffers a loss 2018-10-04 120
2018-10-01 ABC Corp to Sell stakes 2018-10-04 120
2018-12-22 Shares are down massively for ABC Corp. 2018-12-24 131
Both the merging results in separate dfs, however there are values in both the cases which are missing, like for second case for 4th October price, news from 29th, 30th Sept should have been merged. And in case 2 for 24th December price 20th December should also have been merged.
So I'm not quite able to figure out where am I going wrong.
P.S. My objective is to merge the price df with the news df that have come in the last 5 days from the price date.
Upvotes: 0
Views: 3351
Reputation: 5451
Here is my solution using numpy
df_n = pd.DataFrame([('2018-09-29', 'Huge blow to ABC Corp. as they lost the 2012 tax case'), ('2018-09-30', 'ABC Corp. suffers a loss'), ('2018-10-01', 'ABC Corp to Sell stakes'), ('2018-12-20', 'We are going to comeback strong said ABC CEO'), ('2018-12-22', 'Shares are down massively for ABC Corp.')], columns=('News_Dates', 'News'))
df1_zscore = pd.DataFrame([('2018-10-04', '120'), ('2018-12-24', '131')], columns=('Dates', 'Price'))
df_n["News_Dates"] = pd.to_datetime(df_n["News_Dates"])
df1_zscore["Dates"] = pd.to_datetime(df1_zscore["Dates"])
n_dates = df_n["News_Dates"].values
p_dates = df1_zscore[["Dates"]].values
## substract each pair of n_dates and p_dates and create a matrix
mat_date_compare = (p_dates - n_dates).astype('timedelta64[D]')
## get matrix of boolean for which difference is between 0 and 5 day
## to be used as index for original array
comparision = (mat_date_compare <= pd.Timedelta("5d")) & (mat_date_compare >= pd.Timedelta("0d"))
## get cell numbers which is in range 0 to matrix size which meets the condition
ind = np.arange(len(n_dates)*len(p_dates))[comparision.ravel()]
## calculate row and column index from cell number to index the df
pd.concat([df1_zscore.iloc[ind//len(n_dates)].reset_index(drop=True),
df_n.iloc[ind%len(n_dates)].reset_index(drop=True)], sort=False, axis=1)
Result
Dates Price News_Dates News
0 2018-10-04 120 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 t...
1 2018-10-04 120 2018-09-30 ABC Corp. suffers a loss
2 2018-10-04 120 2018-10-01 ABC Corp to Sell stakes
3 2018-12-24 131 2018-12-20 We are going to comeback strong said ABC CEO
4 2018-12-24 131 2018-12-22 Shares are down massively for ABC Corp.
Upvotes: 0
Reputation: 1167
You can swap the left and right dataframe:
df = pd.merge_asof(
df1,
df2,
left_on='News_Dates',
right_on='Dates',
tolerance=pd.Timedelta('5D'),
direction='nearest'
)
df = df[['Dates', 'News_Dates', 'News', 'Price']]
print(df)
Dates News_Dates News Price
0 2018-10-04 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 t... 120
1 2018-10-04 2018-09-30 ABC Corp. suffers a loss 120
2 2018-10-04 2018-10-01 ABC Corp to Sell stakes 120
3 2018-12-24 2018-12-20 We are going to comeback strong said ABC CEO 131
4 2018-12-24 2018-12-22 Shares are down massively for ABC Corp. 131
Upvotes: 2