Mark McGown
Mark McGown

Reputation: 1115

How to convert time to string and leave NaT as-is in Python df?

I have a df in Python I'm trying to load to Snowflake. Its contents are currently either a Timestamp() datatype or literal NaT. The problematic column, when loaded currently, looks like:

enter image description here

And the source of those two rows looks like:

DEACTIVATED_AT
NaT
Timestamp('2020-06-20 01:47:15')

I need to convert the Timestamp to a literal string but df['DEACTIVATED_AT'].astype(str) will error with the NaT in Snowflake, because it is trying to insert the literal 'NaT' string to the DB.

ProgrammingError: 100071 (22000): Failed to cast variant value "NaT" to TIMESTAMP_NTZ

How can I leave the NaT's as they were and only convert the Timestamp's to strings?

I've tried:

df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna(np.datetime64('NaT')).astype(str)
#converts NaT to string still
df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].apply(lambda v: str(v) if not pd.isnull(v) else v).tolist()
#Leaves NaT correct but doesn't convert the Timestamps for some reason

Upvotes: 0

Views: 2288

Answers (3)

ABC
ABC

Reputation: 645

you could try this, not sure if this is what you are looking for:

df['DEACTIVATED_AT'].dt.strftime('%Y-%m-%d %H:%M:%S').fillna(pd.NaT)
  • convert DEACTIVATED_AT from datetime to str.
  • because NaT would convert into NaN during the conversion, you can do fillna() to replace NaN with NaT.
  • however, the column dtype would be as object instead of datetime64[ns].

Upvotes: 3

Paul Brennan
Paul Brennan

Reputation: 2696

import pandas as pd
data = { 'DEACTIVATED_AT' : [pd.NaT, pd.Timestamp('2020-06-20 01:47:15')]} 
df = pd.DataFrame(data)

# now do the work
df['test'] = df['DEACTIVATED_AT'].apply(lambda x: x.strftime('%Y-%m-%d') if not pd.isnull(x) else pd.NaT)

will give

    DEACTIVATED_AT      test
0   NaT                 NaT
1   2020-06-20 01:47:15 2020-06-20

and

df['test'] = df['DEACTIVATED_AT'].apply(lambda x: str(x.strftime('%Y-%m-%d %H:%M:%S')) if not pd.isnull(x) else str(pd.NaT))

gives

    DEACTIVATED_AT      test
0   NaT                 NaT
1   2020-06-20 01:47:15 2020-06-20 01:47:15

with

df.dtypes

as

DEACTIVATED_AT    datetime64[ns]
test                      object
dtype: object

and

print(type(df['test'][1]))

giving

<class 'str'>

Upvotes: 0

Mark McGown
Mark McGown

Reputation: 1115

This is what worked for me, making everything strings with a unique string in place of the NaT's to later substitute back:

df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna('sub').astype(str).replace('sub',np.nan)

I had to use np.nan rather than pd.NaT to avoid this pyarrow error:

ArrowTypeError: ('an integer is required (got type str)', 'Conversion failed for column DEACTIVATED_AT with type object')

Open to more elegant solutions though!

Upvotes: 0

Related Questions