jbuddy_13
jbuddy_13

Reputation: 1276

Pandas: Series and DataFrames handling of datetime objects

Pandas behaves in an unusual way when interacting with datetime information when the data type is a Series vs when it is not. Specifically, either .dt is required (if it's a Series) or .dt will throw an error (if it's not a Series.) I've spent the better part of an hour tracking the behavior down.

import pandas as pd
data = {'dates':['2019-03-01','2019-03-02'],'event':[0,1]}
df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'])

Pandas Series:

df['dates'][0:1].dt.year
>>> 
0    2019
Name: dates, dtype: int64
df['dates'][0:1].year
>>>
AttributeError: 'Series' object has no attribute 'year'

Not Pandas Series:

df['dates'][0].year
>>>
2019
df['dates'][0].dt.year
>>>
AttributeError: 'Timestamp' object has no attribute 'dt'

Does anyone know why Pandas behaves this way? Is this a "feature not a bug" like it's actually useful in setting?

Upvotes: 0

Views: 141

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35646

This behaviour is consistent with python. A collection of datetimes is fundamentally different than a single datetime.

We can see this simply with list vs datetime object:

from datetime import datetime

a = datetime.now()
print(a.year) 
# 2021

list_of_datetimes = [datetime.now(), datetime.now()]
print(list_of_datetimes.year)
# AttributeError: 'list' object has no attribute 'year'

Naturally a list does not have a year attribute, because in python we cannot guarantee the list contains only datetimes.

We would have to apply some function to each element in the list to access the year, for example:

from datetime import datetime

list_of_datetimes = [datetime.now(), datetime.now()]

print(*map(lambda d: d.year, list_of_datetimes))
# 2021 2021

This concept of "applying an operation over a collection of datetimes" is fundamentally what the dt accessor does. By extension, this accessor is unnecessary when affecting a single element as it is when working with only a single datetime.


In pandas we can only use the dt accessor with DateTime Series.

There are a lot of guarantees needed to be made in order to apply the year to all elements in the Series:

import pandas as pd

data = {'dates': ['2019-03-01', '2019-03-02'], 'event': [0, 1]}
df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'])
print(df['dates'].dt.year)
0    2019
1    2019
Name: dates, dtype: int64

Again, however, since a column of object type could contain both datetimes and non-datetimes we may need to access the individual elements. Like:

import pandas as pd

data = {'dates': ['2019-03-01', 87], 'event': [0, 1]}
df = pd.DataFrame(data)
print(df)
#         dates  event
# 0  2019-03-01      0
# 1          87      1

# Convert only 1 value to datetime
df.loc[0, 'dates'] = pd.to_datetime(df.loc[0, 'dates'])
print(df.loc[0, 'dates'].year)
# 2019
print(df.loc[1, 'dates'].year)
# AttributeError: 'int' object has no attribute 'year'

Upvotes: 1

Related Questions