Tahnoon Pasha
Tahnoon Pasha

Reputation: 6018

Can't convert R date ordinal to Python accurately

Following on from the question here:

I'm trying to create the series by hand here using Rpy2

import rpy2.robjects as ro
from rpy2.robjects.packages import importr
import pandas.rpy.common as com

pa = importr("pa")

ro.r("data(jan)")
jan = com.load_data('jan')

jan_r  = com.convert_to_r_dataframe(jan)

name = ro.StrVector([str(i) for i in jan['name']])
sector = ro.StrVector([str(i) for i in jan['sector']])
date = ro.StrVector([str(i) for i in jan['date']])

and I get at date number of 14610 in the date field representing 2010-01-01 which I suspect is a 1970-01-01 origin. I can't find anything in the datetime module that will allow me to change the origin for the date however so I don't know how to reset it.

My questions:

  1. Is the origin for the R sourced date 1970-01-01?
  2. Is there a way to set an origin and covert to a datetime.datetime object in python?
  3. Am I missing something more obvious here?

Thanks

Upvotes: 4

Views: 1338

Answers (2)

Camilo Abboud
Camilo Abboud

Reputation: 919

Ok, but how to express this number correctly in python?

import datetime
pd.to_datetime(18402,unit='D', origin='1970-1-1')`

18402 corresponds to 2020-05-20. The parameter origin is the default one, so you can skip it.

Upvotes: 2

Richie Cotton
Richie Cotton

Reputation: 121127

Is the origin for the R sourced date 1970-01-01?

From ?Date:

Dates are represented as the number of days since 1970-01-01, with negative values for earlier dates.


I get at date number of 14610 in the date field representing 2010-01-01 which I suspect is a 1970-01-01 origin.

Well suspected.

as.Date(14610, origin = "1970-01-01")
## [1] "2010-01-01"

Is there a way to set an origin and covert to a datetime.datetime object in python?

Python datetime docs show several ways of constructing a date.

You can use datetime.date(year, month, day) syntax, where those values can be retrieved from the R dates using year(x), month(x) and mday(x), where x represents your date vector.

You can use date.fromtimestamp(timestamp) syntax, where the timestamps can be retrieved from the R dates using format(x).

The date.fromordinal(ordinal) documentation returns:

the date corresponding to the Gregorian ordinal, where January 1 of year 1 has ordinal 1

So presumably your problem is that you are passing dates as numbers which R calculates as days from 1st Jan 1970, and python assumes are from 1st Jan 0001.

Upvotes: 5

Related Questions