How to convert time to string and leave NaT as-is in Python df?

I have a df in Python I'm trying to load to Snowflake. Its contents are currently either a Timestamp() datatype or literal NaT. The problematic column, when loaded currently, looks like:

And the source of those two rows looks like:

DEACTIVATED_AT
NaT
Timestamp('2020-06-20 01:47:15')

I need to convert the Timestamp to a literal string but df['DEACTIVATED_AT'].astype(str) will error with the NaT in Snowflake, because it is trying to insert the literal 'NaT' string to the DB.

ProgrammingError: 100071 (22000): Failed to cast variant value "NaT" to TIMESTAMP_NTZ

How can I leave the NaT's as they were and only convert the Timestamp's to strings?

I've tried:

df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna(np.datetime64('NaT')).astype(str)
#converts NaT to string still

df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].apply(lambda v: str(v) if not pd.isnull(v) else v).tolist()
#Leaves NaT correct but doesn't convert the Timestamps for some reason

Upvotes: 0

Answers (3)

ABC

Reputation: 645

you could try this, not sure if this is what you are looking for:

df['DEACTIVATED_AT'].dt.strftime('%Y-%m-%d %H:%M:%S').fillna(pd.NaT)

convert DEACTIVATED_AT from datetime to str.
because NaT would convert into NaN during the conversion, you can do fillna() to replace NaN with NaT.
however, the column dtype would be as object instead of datetime64[ns].

Upvotes: 3

Paul Brennan

Reputation: 2696

import pandas as pd
data = { 'DEACTIVATED_AT' : [pd.NaT, pd.Timestamp('2020-06-20 01:47:15')]} 
df = pd.DataFrame(data)

# now do the work
df['test'] = df['DEACTIVATED_AT'].apply(lambda x: x.strftime('%Y-%m-%d') if not pd.isnull(x) else pd.NaT)

will give

    DEACTIVATED_AT      test
0   NaT                 NaT
1   2020-06-20 01:47:15 2020-06-20

and

df['test'] = df['DEACTIVATED_AT'].apply(lambda x: str(x.strftime('%Y-%m-%d %H:%M:%S')) if not pd.isnull(x) else str(pd.NaT))

gives

    DEACTIVATED_AT      test
0   NaT                 NaT
1   2020-06-20 01:47:15 2020-06-20 01:47:15

with

df.dtypes

DEACTIVATED_AT    datetime64[ns]
test                      object
dtype: object

and

print(type(df['test'][1]))

giving

<class 'str'>

Upvotes: 0

Mark McGown

Reputation: 1115

This is what worked for me, making everything strings with a unique string in place of the NaT's to later substitute back:

df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna('sub').astype(str).replace('sub',np.nan)

I had to use np.nan rather than pd.NaT to avoid this pyarrow error:

ArrowTypeError: ('an integer is required (got type str)', 'Conversion failed for column DEACTIVATED_AT with type object')

Open to more elegant solutions though!

Upvotes: 0

How to convert time to string and leave NaT as-is in Python df?

Answers (3)

Related Questions