jangles
jangles

Reputation: 323

How to put two pandas dataframes together, with one dataframe missing dates

I have historical ticker data in one dataframe, and historical dividend data in another dataframe. They look like this:

        Date        High        Low         Open        Close       Volume      Adj Close
0       1993-01-29  43.968750   43.750000   43.968750   43.937500   1003200.0   26.453930
1       1993-02-01  44.250000   43.968750   43.968750   44.250000   480500.0    26.642057
2       1993-02-02  44.375000   44.125000   44.218750   44.343750   201300.0    26.698507
3       1993-02-03  44.843750   44.375000   44.406250   44.812500   529400.0    26.980742
4       1993-02-04  45.093750   44.468750   44.968750   45.000000   531500.0    27.093624
... ... ... ... ... ... ... ...
6809    2020-02-13  338.119995  335.559998  335.859985  337.059998  54501900.0  337.059998
6810    2020-02-14  337.730011  336.200012  337.510010  337.600006  64582200.0  337.600006
6811    2020-02-18  337.670013  335.209991  336.510010  336.730011  57226200.0  336.730011
6812    2020-02-19  339.079987  337.480011  337.790009  338.339996  48639100.0  338.339996
6813    2020-02-20  338.640015  333.681702  337.742004  336.894989  63193703.0  336.894989
            action      value
2019-12-20  DIVIDEND    1.570
2019-09-20  DIVIDEND    1.384
2019-06-21  DIVIDEND    1.432
2019-03-15  DIVIDEND    1.233
2018-12-21  DIVIDEND    1.435
... ... ...
1994-03-18  DIVIDEND    0.271
1993-12-17  DIVIDEND    0.317
1993-09-17  DIVIDEND    0.286
1993-06-18  DIVIDEND    0.318
1993-03-19  DIVIDEND    0.213

I want something like this:

        Date        High        Low         Open        Close       Volume      Adj Close       Dividend
0       1993-01-29  43.968750   43.750000   43.968750   43.937500   1003200.0   26.453930           
1       1993-02-01  44.250000   43.968750   43.968750   44.250000   480500.0    26.642057           
2       1993-02-02  44.375000   44.125000   44.218750   44.343750   201300.0    26.698507           
3       1993-02-03  44.843750   44.375000   44.406250   44.812500   529400.0    26.980742           
4       1993-02-04  45.093750   44.468750   44.968750   45.000000   531500.0    27.093624           
... ... ... ... ... ... ... ...
6809    2020-02-13  338.119995  335.559998  335.859985  337.059998  54501900.0  337.059998          1.570
6810    2020-02-14  337.730011  336.200012  337.510010  337.600006  64582200.0  337.600006          1.570
6811    2020-02-18  337.670013  335.209991  336.510010  336.730011  57226200.0  336.730011          1.570
6812    2020-02-19  339.079987  337.480011  337.790009  338.339996  48639100.0  338.339996          1.570
6813    2020-02-20  338.640015  333.681702  337.742004  336.894989  63193703.0  336.894989          1.570

(I have no dividend data before 1993-03-19, but I do onward.)

Basically, I want to merge them so that the dividend column is continuous with the dates in the historical dataframe.

Upvotes: 0

Views: 744

Answers (4)

Vrushabh Bhangod
Vrushabh Bhangod

Reputation: 1

price_table = price_table.merge(dividend_table, left_on = 'Date', right_on ='Date',how = 'outer')
price.value = price.value.fillna('')

If the dates are not in the same format convert using:

price_table['Date'] = pd.to_datetime(price_table['Date'])

Upvotes: 0

fjsh
fjsh

Reputation: 13

You can join, then fill missing values and finally rename columns.

Supose your dataframes are named market_prices and dividend_data.

You can do this:

import pandas as pd

# Make sure the keys share type
market_prices['Date'] = pd.to_datetime(market_prices['Date'])
dividend_data.index = pd.to_datetime(dividend_data.index)

# Join the data
prices_with_dividends = market_prices.join(
    dividend_data['value'], how='left', on='Date'
)
# Fill missing values with previous data
prices_with_dividends['value'].fillna(method='ffill', inplace=True)
# Rename column
prices_with_dividends.rename({'value': 'Dividend'}, axis=1, inplace=True)

That should do.

Upvotes: 1

jangles
jangles

Reputation: 323

Interpolated from @fjsh's answer.

# Join the data
prices_with_dividends = features.set_index('Date').join(dividend_data.set_index('Date'))

# Fill missing values with previous data
prices_with_dividends['Value'].fillna(method='ffill', inplace=True)

# Rename column
prices_with_dividends.rename({'Value': 'Dividend'}, axis=1, inplace=True)

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13377

Assuming df is the data frame with historical ticker data, and df2 is the one with dividend information.

#prerequisite:
df["Date"]=pd.to_datetime(df["Date"])

df=df.set_index("Date")
df["Dividend"]=df2["value"]
df=df.reset_index()

Upvotes: 1

Related Questions