Valip
Valip

Reputation: 4650

How to fix pandas FutureWarning: inferring datetime64[ns] from data containing strings is deprecated

I have the following code which worked fine until I've upgraded pandas to the latest version:

def group_amounts_by_batch(self) -> pd.DataFrame:
    data_frame = (
        self.data_frame[self.columns.amount]
        .groupby(
            [
                self.data_frame[ExtraColumnNames.BatchId],
                self.data_frame[ExtraColumnNames.DepositId],
                self.data_frame[ExtraColumnNames.DepositAmount],
                self.data_frame[ExtraColumnNames.DepositDate],
                self.data_frame[ExtraColumnNames.BatchDate],
                self.data_frame[ExtraColumnNames.NonFundedAmount],
            ]
        )
        .sum()
        .reset_index()
    )
    data_frame = data_frame[data_frame[ExtraColumnNames.DepositId] != ""]
    data_frame = data_frame.round(2)
    return data_frame

But now I'm getting the following warning:

FutureWarning: Inferring datetime64[ns] from data containing strings is deprecated and will be removed in a future version. To retain the old behavior explicitly pass Series(data, dtype={value.dtype}) .reset_index()

How can I use the suggested solution (explicitly pass Series(data, dtype={value.dtype}) .reset_index()) in my code to fix that warning?

data_frame.dtypes:
Batch Id                    object
Deposit Id                  object
Deposit Amount              object
Deposit Date        datetime64[ns]
Batch Date          datetime64[ns]
NonFunded Amount            object
amount                     float64
dtype: object

Upvotes: 2

Views: 9728

Answers (2)

You should specify dtype argument to your data when you create your dataframe:

values = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
data = np.array(values, dtype=[('col_1', 'object'), ('col_2', 'datetime64')])
df = pd.DataFrame.from_records(data)

In your case when you created your dataframe you only specified the values without the dtype! Create your own dtype to add it in your code.

You can automate the creation of dtype using tuples, consider you have 2 lists: 1 list 'cols' for columns of dataframe and another list 'types' for column types.

dtype = [tuple([col, types[index]])
                    for index, col in enumerate(cols)]

Upvotes: 2

Vlad V
Vlad V

Reputation: 53

I have got similar warning while parsing an excel file with string dates:

df = pd.read_excel('path/to/file')

Workaround that helped me to get rid of it is following:

df = pd.read_excel('path/to/file', 
                    parse_dates=['col_date1', 'col_date2', ...], 
                    date_parser=lambda x: pd.to_datetime(x, format='%Y-%m-%d'))

So in your case try to wrap date columns at grouping array with pd.to_datetime() function. However, it is worth to refactor part of your code that prepares initial frame self.data_frame and consider proper date columns parsing there.

Upvotes: 0

Related Questions