Reputation: 1276
Pandas behaves in an unusual way when interacting with datetime information when the data type is a Series vs when it is not. Specifically, either .dt
is required (if it's a Series) or .dt
will throw an error (if it's not a Series.) I've spent the better part of an hour tracking the behavior down.
import pandas as pd
data = {'dates':['2019-03-01','2019-03-02'],'event':[0,1]}
df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'])
Pandas Series:
df['dates'][0:1].dt.year
>>>
0 2019
Name: dates, dtype: int64
df['dates'][0:1].year
>>>
AttributeError: 'Series' object has no attribute 'year'
Not Pandas Series:
df['dates'][0].year
>>>
2019
df['dates'][0].dt.year
>>>
AttributeError: 'Timestamp' object has no attribute 'dt'
Does anyone know why Pandas behaves this way? Is this a "feature not a bug" like it's actually useful in setting?
Upvotes: 0
Views: 141
Reputation: 35646
This behaviour is consistent with python. A collection of datetimes is fundamentally different than a single datetime.
We can see this simply with list
vs datetime
object:
from datetime import datetime
a = datetime.now()
print(a.year)
# 2021
list_of_datetimes = [datetime.now(), datetime.now()]
print(list_of_datetimes.year)
# AttributeError: 'list' object has no attribute 'year'
Naturally a list does not have a year attribute, because in python we cannot guarantee the list contains only datetimes.
We would have to apply some function to each element in the list to access the year
, for example:
from datetime import datetime
list_of_datetimes = [datetime.now(), datetime.now()]
print(*map(lambda d: d.year, list_of_datetimes))
# 2021 2021
This concept of "applying an operation over a collection of datetimes" is fundamentally what the dt
accessor does. By extension, this accessor is unnecessary when affecting a single element as it is when working with only a single datetime.
In pandas
we can only use the dt
accessor with DateTime Series.
There are a lot of guarantees needed to be made in order to apply the year to all elements in the Series:
import pandas as pd
data = {'dates': ['2019-03-01', '2019-03-02'], 'event': [0, 1]}
df = pd.DataFrame(data)
df['dates'] = pd.to_datetime(df['dates'])
print(df['dates'].dt.year)
0 2019
1 2019
Name: dates, dtype: int64
Again, however, since a column of object type could contain both datetimes and non-datetimes we may need to access the individual elements. Like:
import pandas as pd
data = {'dates': ['2019-03-01', 87], 'event': [0, 1]}
df = pd.DataFrame(data)
print(df)
# dates event
# 0 2019-03-01 0
# 1 87 1
# Convert only 1 value to datetime
df.loc[0, 'dates'] = pd.to_datetime(df.loc[0, 'dates'])
print(df.loc[0, 'dates'].year)
# 2019
print(df.loc[1, 'dates'].year)
# AttributeError: 'int' object has no attribute 'year'
Upvotes: 1