Reputation: 639
I am using pandas version 0.16.2. I want to extract Year and Month of the date columns.
I read the data
df = pd.read_csv(raw_data.csv,
parse_dates=['EOM_DEFAULT_DATE','RESOLUTION_DATE'], low_memory=False)
'EOM_DEFAULT_DATE' looks like:
0 31-JAN-07 12.00.00.000000000 AM
1 31-JAN-07 12.00.00.000000000 AM
Name: EOM_DEFAULT_DATE, dtype: object
'RESOLUTION DATE' looks like:
0 2008-03-31
1 2008-03-31
Name: RESOLUTION_DATE, dtype: datetime64[ns]
Specifically, I want to extract Year this way, but get this error:
df['YEAR']=pd.DatetimeIndex(df['RESOLUTION_DATE']).year
---
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Also, I get an error when trying to extract month:
df['MNTH']=pd.DatetimeIndex(df['EOM_DEFAULT_DATE']).month
---
File "<ipython-input-61-d7aec9a17a8f>", line 1, in <module>
File "C:\Continuum\Anaconda\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)
File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 292, in __new__
yearfirst=yearfirst)
File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1936, in _str_to_dt_array
data = _algos.arrmap_object(arr, parser)
File "pandas\src\generated.pyx", line 2295, in pandas.algos.arrmap_object (pandas\algos.c:77984)
File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1932, in parser
yearfirst=yearfirst)
File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\tools.py", line 494, in parse_time_string
raise DateParseError(e)
DateParseError: unknown string format
Using this exact code, I know others can run the code fine, and extract year and month. What am I missing?
Upvotes: 2
Views: 955
Reputation: 24742
You can use a .dt
accessor to get the year and month on a pd.Series
whose values are datetime64
.
df['YEAR'] = df['RESOLUTION_DATE'].dt.year
To parse the date, you need to supply your datetime format.
dt_str = '31-JAN-07 12.00.00.000000000 AM'
fmt = '%d-%b-%y %H.%M.%S.%f %p'
pd.to_datetime(dt_str, format=fmt)
#output: Timestamp('2007-01-31 12:00:00')
Maybe try not parsing the date when reading csv because you have two date columns and they have different formats. Just read-in the raw string. and then convert string to datetime object in pandas.
df['EOM_DEFAULT_DATE'] = pd.to_datetime(df['EOM_DEFAULT_DATE'], format='%d-%b-%y %H.%M.%S.%f %p')
df['RESOLUTION_DATE'] = pd.to_datetime(df['RESOLUTION_DATE'])
Upvotes: 2