Gurdish
Gurdish

Reputation: 109

Finding Min and Max Datetime of row

I have a CSV that has several columns that contain Datetime data. After removing rows that are "Invalid", I want to be able to find the Min and Max Datetime value of each row and and place this result as a new columns. However I seem to be getting a Future Warning error when I attempt to add the code for these 2 new columns.

FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

Here is my script:

import pandas as pd
import os
import glob

# Datetime columns
dt_columns = ['DT1','DT2','DT3']

df = pd.read_csv('full_list.csv', dtype=object)

# Remove "invalid" Result rows 
remove_invalid = df['Result'].str.contains('Invalid')
df.drop(index=df[remove_invalid].index, inplace=True)

for eachCol in dt_columns:
    df[eachCol] = pd.to_datetime(df[eachCol]).dt.strftime('%Y-%m-%d %H:%M:%S')

# Create new columns
df['MinDatetime'] = df[dt_columns].min(axis=1)
df['MaxDatetime'] = df[dt_columns].max(axis=1)

# Save as CSV
df.to_csv('test.csv', index=False)

For context, the CSV looks something like this (usually contains 100000+ rows):

Area    Serial  Route   DT1                 DT2                 DT3                 Result
Brazil  17763   4       13/08/2021 23:46:31 16/10/2021 14:04:27 28/10/2021 08:19:59 Confirmed
China   28345   2       15/09/2021 03:09:21 24/04/2021 09:56:34 04/05/2021 22:07:13 Confirmed
Malta   13630   5       21/03/2021 11:59:27 18/09/2021 11:03:25 02/07/2021 02:32:48 Invalid
Serbia  49478   2       12/04/2021 06:38:05 19/03/2021 03:16:47 29/06/2021 06:39:30 Confirmed
France  34732   1       29/04/2021 03:03:14 24/03/2021 01:49:48 26/04/2021 06:44:21 Invalid
Mexico  21840   3       23/11/2021 12:53:33 10/01/2022 02:42:48 29/04/2021 14:22:51 Invalid
Ukraine 20468   3       04/11/2021 18:40:44 13/11/2021 03:38:39 11/03/2021 09:09:14 Invalid
China   28830   1       07/02/2021 23:50:34 03/12/2021 14:04:32 14/07/2021 22:59:10 Confirmed
India   49641   4       02/06/2021 11:17:35 09/05/2021 13:51:55 19/01/2022 06:56:07 Confirmed
Greece  43163   3       30/11/2021 09:31:29 28/01/2021 08:52:50 12/05/2021 07:49:48 Invalid

I want the minimum and maximum value of each row as a new column but at the moment the code produces the 2 columns with no data in. What am I doing wrong?

Upvotes: 2

Views: 175

Answers (1)

jezrael
jezrael

Reputation: 863711

Better filtering without drop:

# Remove "invalid" Result rows 
remove_invalid = df['Result'].str.contains('Invalid')
df = df[~remove_invalid]

You need working with datetimes, so cannot after converting use Series.dt.strftime for strings repr of datetimes:

for eachCol in dt_columns:
    df[eachCol] = pd.to_datetime(df[eachCol], dayfirst=True)

Upvotes: 2

Related Questions