Reputation: 2929
I want to create a pandas dataframe df like:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}
)
I want to specify column "date" as dtype=datetime64[ns] while creating the dataframe, not afterwards.
So not like this:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}
)
df["date"] = pd.to_datetime(df["date"])
But like this:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.Series(
["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
dtype=np.datetime64,
),
"value": [10, 20, 16, 31, 56],
}
)
but that gives the error:
ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.
Doing that
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.Series(
["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
dtype=np.datetime64[ns],
),
"value": [10, 20, 16, 31, 56],
}
)
I get this error:
NameError: name 'ns' is not defined
So how can I set a specific column as type "datetime64[ns]"?
Upvotes: 4
Views: 3989
Reputation: 24314
Use dtype='datetime64[ns]'
:
df=pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.Series(
["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
dtype='datetime64[ns]',
),
"value": [10, 20, 16, 31, 56],
}
)
OR
use astype()
:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}
)
df['date']=df['date'].astype("Datetime64")
OR
via pd.Timestamp()
:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": list(map(pd.Timestamp,["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"])),
"value": [10, 20, 16, 31, 56],
}
)
Upvotes: 3
Reputation: 23217
You can use pd.to_datetime
within the dict, as follows:
df = pd.DataFrame(
{
"group": ["A", "A", "A", "A", "A"],
"date": pd.to_datetime(["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"]),
"value": [10, 20, 16, 31, 56],
}
)
date
column is in datetime64[ns]
format, as you can see by df.info()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 group 5 non-null object
1 date 5 non-null datetime64[ns]
2 value 5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes
Upvotes: 5
Reputation: 120439
I don't know why it's a problem to convert date after the creation of the dataframe? You can chain the df
creation with astype
:
df = pd.DataFrame(
{"group": ["A", "A", "A", "A", "A"],
"date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
"value": [10, 20, 16, 31, 56],
}).astype({'date': 'datetime64'})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 group 5 non-null object
1 date 5 non-null datetime64[ns]
2 value 5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes
Upvotes: 0