user3889486
user3889486

Reputation: 656

Why a date assignments in a dataframe is not a date type?

The df_vol DataFrame is created as follows

df_vol = df.loc[:, 1].map(fd.retrieve_symbol_datetime).to_frame('maturity')
df_vol['date'] = df_vol.index.date

df_vol.head()
                           maturity        date
2018-11-01 11:31:53.023  2022-04-01  2018-11-01
2018-11-01 16:30:15.287  2022-04-01  2018-11-01
2018-11-01 10:23:06.779  2022-10-01  2018-11-01
2018-11-01 16:30:15.291  2022-10-01  2018-11-01
2018-11-01 11:30:56.251  2018-12-01  2018-11-01

A further inspection of df_vol shows

df_vol.dtypes
maturity    category
date          object
dtype: object

I would expect that maturity column is of a date type as it is filled by the content of the fd.retrieve_symbol_datetime(), a function that returns pandas.datetime(). Also, the date column is an object type though it takes the values from index.date.

I'm interested in having types of datetime since I eventually I want to do the difference

pd.eval("(df_vol.maturity - df_vol.date)")

retrieve_symbol_datetime()

def retrieve_symbol_datetime(future: str):
    """
    Retrieves the maturity date of a future whose format is of the form AAAMYY.

    Params
    -------
    future : string, of form 'AAAMYY'
        This format is for futures where 'AAA' is the string that identifies
        the symbol, 'M' is the character that identifies the month, and 'YY' is
        a two-digit number that identifies the year.

    Returns : pandas.datetime
        Returns the date of maturiry of the future's symbol.

    Example
    -------
    If future = 'DI1Z20', then it returnts a pandas.datetime(2020, 12, 01).

    """
    year = 2000 + int(future[4: 6])
    month = convert_letter_symbol_month(future[3: 4])
    return pd.datetime(year, month, 1).date()

Upvotes: 3

Views: 64

Answers (1)

jezrael
jezrael

Reputation: 862601

There is problem categorical column, one possible solution is decategorical it and for date use floor for remove times:

df_vol['maturity'] = pd.to_datetime(df_vol['maturity'].astype(str))
df_vol['date'] = df_vol.index.floor('d')

df_vol['diff'] = (df_vol['maturity'] - df_vol['date']).dt.days
print (df_vol)
                          maturity       date  diff
2018-11-01 11:31:53.023 2022-04-01 2018-11-01  1247
2018-11-01 16:30:15.287 2022-04-01 2018-11-01  1247
2018-11-01 10:23:06.779 2022-10-01 2018-11-01  1430
2018-11-01 16:30:15.291 2022-10-01 2018-11-01  1430
2018-11-01 11:30:56.251 2018-12-01 2018-11-01    30

Upvotes: 1

Related Questions