Vega
Vega

Reputation: 2929

Create dataframe and set column as datetime?

I want to create a pandas dataframe df like:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
        "value": [10, 20, 16, 31, 56],
    }
)

I want to specify column "date" as dtype=datetime64[ns] while creating the dataframe, not afterwards.

So not like this:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
        "value": [10, 20, 16, 31, 56],
    }
)
df["date"] = pd.to_datetime(df["date"])

But like this:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.Series(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            dtype=np.datetime64,
        ),
        "value": [10, 20, 16, 31, 56],
    }
)

but that gives the error:

ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.

Doing that

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.Series(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            dtype=np.datetime64[ns],
        ),
        "value": [10, 20, 16, 31, 56],
    }
) 

I get this error:

NameError: name 'ns' is not defined

So how can I set a specific column as type "datetime64[ns]"?

Upvotes: 4

Views: 3989

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24314

Use dtype='datetime64[ns]':

df=pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.Series(
            ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
            dtype='datetime64[ns]',
        ),
        "value": [10, 20, 16, 31, 56],
    }
)

OR

use astype():

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
        "value": [10, 20, 16, 31, 56],
    }
)
df['date']=df['date'].astype("Datetime64")

OR

via pd.Timestamp():

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": list(map(pd.Timestamp,["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"])),
        "value": [10, 20, 16, 31, 56],
    }
)

Upvotes: 3

SeaBean
SeaBean

Reputation: 23217

You can use pd.to_datetime within the dict, as follows:

df = pd.DataFrame(
    {
        "group": ["A", "A", "A", "A", "A"],
        "date": pd.to_datetime(["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"]),
        "value": [10, 20, 16, 31, 56],
    }
)

date column is in datetime64[ns] format, as you can see by df.info()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   group   5 non-null      object        
 1   date    5 non-null      datetime64[ns]
 2   value   5 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes

Upvotes: 5

Corralien
Corralien

Reputation: 120439

I don't know why it's a problem to convert date after the creation of the dataframe? You can chain the df creation with astype:

df = pd.DataFrame(
    {"group": ["A", "A", "A", "A", "A"],
     "date": ["2020-01-02", "2020-01-13", "2020-02-01", "2020-02-23", "2020-03-05"],
     "value": [10, 20, 16, 31, 56],
    }).astype({'date': 'datetime64'})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   group   5 non-null      object
 1   date    5 non-null      datetime64[ns]
 2   value   5 non-null      int64
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 248.0+ bytes

Upvotes: 0

Related Questions