Reputation: 183
I have a dataset with two columns
df = pd.DataFrame({'Date': [195101, 195102, 195103, 195104, 195105],
'Value': [1.5, 0.9, -0.1, -0.3, -0.7]})
Date Value
0 195101 1.5
1 195102 0.9
2 195103 -0.1
3 195104 -0.3
4 195105 -0.7
After checking types
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 5 non-null int64
1 Value 5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 208.0 bytes
'Date' is of type int. After trying to convert it to datetime with
df['Date'] = pd.to_datetime(df['Date'])
The result is this:
Date Value
0 1970-01-01 00:00:00.000195101 1.5
1 1970-01-01 00:00:00.000195102 0.9
2 1970-01-01 00:00:00.000195103 -0.1
3 1970-01-01 00:00:00.000195104 -0.3
4 1970-01-01 00:00:00.000195105 -0.7
Instead I would like to get Year-Month format
Date Value
0 1951-01 1.5
1 1951-02 0.9
2 1951-03 -0.1
3 1951-04 -0.3
4 1951-05 -0.7
Question solved from answer(accepted) below, with:
df['Date'] = pd.to_datetime(df.Date.astype(str), format='%Y%m').dt.to_period('M')
Upvotes: 1
Views: 160
Reputation: 59579
Your date column is a numeric type so by default pandas
thinks it is the number of nanoseconds since the POSIX origin (1970-01-01). To get what you want, you must convert to a string and then provide the proper format.
# To properly format your numeric-dates:
pd.to_datetime(df.Date.astype(str), format='%Y%m')
#0 1951-01-01
#1 1951-02-01
#2 1951-03-01
#3 1951-04-01
And if you want the monthly periods then add on a .to_period
pd.to_datetime(df.Date.astype(str), format='%Y%m').dt.to_period('M')
#0 1951-01
#1 1951-02
#2 1951-03
#3 1951-04
Upvotes: 1