Reputation: 21
My current dataset is as follows:
To provide more details to my column headers, I renamed the headers as follows:
cleaned_df.rename(columns={'month':'yyyy_mm','no_of_rainy_days':'no_of_rainy_days (days)', 'total_rainfall':'total_rainfall (mm)', 'mean_rh':'monthly_mean_humidity (%)', 'mean_sunshine_hrs':'mean_sunshine_hrs (hr)', 'cpi':'consumer_price_index (%)'})
As seen in image 2, my month column is of an object data type and I would like to convert it to a datetime64[ns] format.
I tried the following code:
cleaned_df['month'] = pd.to_datetime(cleaned_df['month']).dt.strftime('%Y-%m')
and it gives me the format of YYYY-MM as I wanted - image 1.
However, when i checked its information again, it still shows as an object. Additionally, the code throws a warning: /var/folders/90/274n3kt55bl4fjg8t134196r0000gn/T/ipykernel_841/3350504036.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy cleaned_df['month'] = pd.to_datetime(cleaned_df['month']).dt.strftime('%Y-%m')
How do I properly code this according to best practice and to ensure the object gets converted?
Upvotes: 1
Views: 1146
Reputation: 13267
Example Code
s = pd.Series(['2020-01', '2020-02'])
s
0 2020-01
1 2020-02
dtype: object <-- chk dtype
Step 1
pd.to_datetime
: convert datetime 64[ns]
pd.to_datetime(s)
result:
0 2020-01-01
1 2020-02-01
dtype: datetime64[ns] <-- chk dtype
Step2
dt.strftime(format)
: convert datetime to string in given format
pd.to_datetime(s).dt.strftime('%Y-%m')
result:
0 2020-01
1 2020-02
dtype: object <-- chk dtype
It's not that code you changed doesn't apply, it comes out as you changed.
Your code is converting str(object) to datetime and then back to str(object).
Upvotes: 1