Reputation: 4650
I have the following code which worked fine until I've upgraded pandas to the latest version:
def group_amounts_by_batch(self) -> pd.DataFrame:
data_frame = (
self.data_frame[self.columns.amount]
.groupby(
[
self.data_frame[ExtraColumnNames.BatchId],
self.data_frame[ExtraColumnNames.DepositId],
self.data_frame[ExtraColumnNames.DepositAmount],
self.data_frame[ExtraColumnNames.DepositDate],
self.data_frame[ExtraColumnNames.BatchDate],
self.data_frame[ExtraColumnNames.NonFundedAmount],
]
)
.sum()
.reset_index()
)
data_frame = data_frame[data_frame[ExtraColumnNames.DepositId] != ""]
data_frame = data_frame.round(2)
return data_frame
But now I'm getting the following warning:
FutureWarning: Inferring datetime64[ns] from data containing strings is deprecated and will be removed in a future version. To retain the old behavior explicitly pass Series(data, dtype={value.dtype}) .reset_index()
How can I use the suggested solution (explicitly pass Series(data, dtype={value.dtype}) .reset_index()
) in my code to fix that warning?
data_frame.dtypes:
Batch Id object
Deposit Id object
Deposit Amount object
Deposit Date datetime64[ns]
Batch Date datetime64[ns]
NonFunded Amount object
amount float64
dtype: object
Upvotes: 2
Views: 9728
Reputation: 324
You should specify dtype argument to your data when you create your dataframe:
values = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
data = np.array(values, dtype=[('col_1', 'object'), ('col_2', 'datetime64')])
df = pd.DataFrame.from_records(data)
In your case when you created your dataframe you only specified the values without the dtype! Create your own dtype to add it in your code.
You can automate the creation of dtype using tuples, consider you have 2 lists: 1 list 'cols' for columns of dataframe and another list 'types' for column types.
dtype = [tuple([col, types[index]])
for index, col in enumerate(cols)]
Upvotes: 2
Reputation: 53
I have got similar warning while parsing an excel file with string dates:
df = pd.read_excel('path/to/file')
Workaround that helped me to get rid of it is following:
df = pd.read_excel('path/to/file',
parse_dates=['col_date1', 'col_date2', ...],
date_parser=lambda x: pd.to_datetime(x, format='%Y-%m-%d'))
So in your case try to wrap date columns at grouping array with pd.to_datetime() function. However, it is worth to refactor part of your code that prepares initial frame self.data_frame
and consider proper date columns parsing there.
Upvotes: 0