Reputation: 1115
I have a df in Python I'm trying to load to Snowflake. Its contents are currently either a Timestamp() datatype or literal NaT. The problematic column, when loaded currently, looks like:
And the source of those two rows looks like:
DEACTIVATED_AT
NaT
Timestamp('2020-06-20 01:47:15')
I need to convert the Timestamp to a literal string but df['DEACTIVATED_AT'].astype(str) will error with the NaT in Snowflake, because it is trying to insert the literal 'NaT' string to the DB.
ProgrammingError: 100071 (22000): Failed to cast variant value "NaT" to TIMESTAMP_NTZ
How can I leave the NaT's as they were and only convert the Timestamp's to strings?
I've tried:
df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna(np.datetime64('NaT')).astype(str)
#converts NaT to string still
df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].apply(lambda v: str(v) if not pd.isnull(v) else v).tolist()
#Leaves NaT correct but doesn't convert the Timestamps for some reason
Upvotes: 0
Views: 2288
Reputation: 645
you could try this, not sure if this is what you are looking for:
df['DEACTIVATED_AT'].dt.strftime('%Y-%m-%d %H:%M:%S').fillna(pd.NaT)
DEACTIVATED_AT
from datetime to str.fillna()
to replace NaN with NaT.object
instead of datetime64[ns]
.Upvotes: 3
Reputation: 2696
import pandas as pd
data = { 'DEACTIVATED_AT' : [pd.NaT, pd.Timestamp('2020-06-20 01:47:15')]}
df = pd.DataFrame(data)
# now do the work
df['test'] = df['DEACTIVATED_AT'].apply(lambda x: x.strftime('%Y-%m-%d') if not pd.isnull(x) else pd.NaT)
will give
DEACTIVATED_AT test
0 NaT NaT
1 2020-06-20 01:47:15 2020-06-20
and
df['test'] = df['DEACTIVATED_AT'].apply(lambda x: str(x.strftime('%Y-%m-%d %H:%M:%S')) if not pd.isnull(x) else str(pd.NaT))
gives
DEACTIVATED_AT test
0 NaT NaT
1 2020-06-20 01:47:15 2020-06-20 01:47:15
with
df.dtypes
as
DEACTIVATED_AT datetime64[ns]
test object
dtype: object
and
print(type(df['test'][1]))
giving
<class 'str'>
Upvotes: 0
Reputation: 1115
This is what worked for me, making everything strings with a unique string in place of the NaT's to later substitute back:
df['DEACTIVATED_AT'] = df['DEACTIVATED_AT'].fillna('sub').astype(str).replace('sub',np.nan)
I had to use np.nan rather than pd.NaT to avoid this pyarrow error:
ArrowTypeError: ('an integer is required (got type str)', 'Conversion failed for column DEACTIVATED_AT with type object')
Open to more elegant solutions though!
Upvotes: 0