Lorenço
Lorenço

Reputation: 183

pandas incorrect conversion int to date

I have a dataset with two columns

df = pd.DataFrame({'Date': [195101, 195102, 195103, 195104, 195105],
                   'Value': [1.5, 0.9, -0.1, -0.3, -0.7]})
     Date  Value
0  195101    1.5
1  195102    0.9
2  195103   -0.1
3  195104   -0.3
4  195105   -0.7

After checking types

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    5 non-null      int64  
 1   Value   5 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 208.0 bytes

'Date' is of type int. After trying to convert it to datetime with

df['Date'] = pd.to_datetime(df['Date'])

The result is this:

                           Date  Value
0 1970-01-01 00:00:00.000195101    1.5
1 1970-01-01 00:00:00.000195102    0.9
2 1970-01-01 00:00:00.000195103   -0.1
3 1970-01-01 00:00:00.000195104   -0.3
4 1970-01-01 00:00:00.000195105   -0.7

Instead I would like to get Year-Month format

      Date  Value
0  1951-01    1.5
1  1951-02    0.9
2  1951-03   -0.1
3  1951-04   -0.3
4  1951-05   -0.7

Question solved from answer(accepted) below, with:

df['Date'] = pd.to_datetime(df.Date.astype(str), format='%Y%m').dt.to_period('M')

Upvotes: 1

Views: 160

Answers (1)

ALollz
ALollz

Reputation: 59579

Your date column is a numeric type so by default pandas thinks it is the number of nanoseconds since the POSIX origin (1970-01-01). To get what you want, you must convert to a string and then provide the proper format.

# To properly format your numeric-dates:
pd.to_datetime(df.Date.astype(str), format='%Y%m')
#0   1951-01-01
#1   1951-02-01
#2   1951-03-01
#3   1951-04-01

And if you want the monthly periods then add on a .to_period

pd.to_datetime(df.Date.astype(str), format='%Y%m').dt.to_period('M')
#0    1951-01
#1    1951-02
#2    1951-03
#3    1951-04

Upvotes: 1

Related Questions