Reputation: 109
I have a CSV that has several columns that contain Datetime data. After removing rows that are "Invalid", I want to be able to find the Min and Max Datetime value of each row and and place this result as a new columns. However I seem to be getting a Future Warning error when I attempt to add the code for these 2 new columns.
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
Here is my script:
import pandas as pd
import os
import glob
# Datetime columns
dt_columns = ['DT1','DT2','DT3']
df = pd.read_csv('full_list.csv', dtype=object)
# Remove "invalid" Result rows
remove_invalid = df['Result'].str.contains('Invalid')
df.drop(index=df[remove_invalid].index, inplace=True)
for eachCol in dt_columns:
df[eachCol] = pd.to_datetime(df[eachCol]).dt.strftime('%Y-%m-%d %H:%M:%S')
# Create new columns
df['MinDatetime'] = df[dt_columns].min(axis=1)
df['MaxDatetime'] = df[dt_columns].max(axis=1)
# Save as CSV
df.to_csv('test.csv', index=False)
For context, the CSV looks something like this (usually contains 100000+ rows):
Area Serial Route DT1 DT2 DT3 Result
Brazil 17763 4 13/08/2021 23:46:31 16/10/2021 14:04:27 28/10/2021 08:19:59 Confirmed
China 28345 2 15/09/2021 03:09:21 24/04/2021 09:56:34 04/05/2021 22:07:13 Confirmed
Malta 13630 5 21/03/2021 11:59:27 18/09/2021 11:03:25 02/07/2021 02:32:48 Invalid
Serbia 49478 2 12/04/2021 06:38:05 19/03/2021 03:16:47 29/06/2021 06:39:30 Confirmed
France 34732 1 29/04/2021 03:03:14 24/03/2021 01:49:48 26/04/2021 06:44:21 Invalid
Mexico 21840 3 23/11/2021 12:53:33 10/01/2022 02:42:48 29/04/2021 14:22:51 Invalid
Ukraine 20468 3 04/11/2021 18:40:44 13/11/2021 03:38:39 11/03/2021 09:09:14 Invalid
China 28830 1 07/02/2021 23:50:34 03/12/2021 14:04:32 14/07/2021 22:59:10 Confirmed
India 49641 4 02/06/2021 11:17:35 09/05/2021 13:51:55 19/01/2022 06:56:07 Confirmed
Greece 43163 3 30/11/2021 09:31:29 28/01/2021 08:52:50 12/05/2021 07:49:48 Invalid
I want the minimum and maximum value of each row as a new column but at the moment the code produces the 2 columns with no data in. What am I doing wrong?
Upvotes: 2
Views: 175
Reputation: 863711
Better filtering without drop
:
# Remove "invalid" Result rows
remove_invalid = df['Result'].str.contains('Invalid')
df = df[~remove_invalid]
You need working with datetimes, so cannot after converting use Series.dt.strftime
for strings repr of datetimes:
for eachCol in dt_columns:
df[eachCol] = pd.to_datetime(df[eachCol], dayfirst=True)
Upvotes: 2