Kent D.
Kent D.

Reputation: 13

rpy2 ggplot2 Error: Invalid input: date_trans works with objects of class Date only

I am trying to use the rpy2 package to call ggplot2 from a python script to plot time series data. I get an error when I try to adjust the date limits of the x-scale. The rpy2 documentation provides this guidance (https://rpy2.readthedocs.io/en/version_2.8.x/vector.html?highlight=date%20vector): "Sequences of date or time points can be stored in POSIXlt or POSIXct objects. Both can be created from Python sequences of time.struct_time objects or from R objects."

Here is my example code:

import numpy as np
import pandas as pd
import datetime as dt
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
import rpy2.robjects.lib.ggplot2 as ggplot2
pandas2ri.activate()

#Create a random dataframe with time series data
df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
                  'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
                           dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
                           dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
                           dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
                           dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})

#Create a POSIXct vector from time.struct_time objects to store the x limits
date_min = dt.datetime(2000, 1, 1).timetuple()
date_max = dt.datetime(2010, 1, 1).timetuple()
date_range = ro.vectors.POSIXct((date_min, date_max))

#Generate the plot
gp = ggplot2.ggplot(df)
gp = (gp + ggplot2.aes_string(x='Time', y='Data') +
      ggplot2.geom_point() +
      ggplot2.scale_x_date(limits=date_range))

When I run this code, I get the following error message:

Error: Invalid input: date_trans works with objects of class Date only

Instead of the POSIXct object, I have also tried the DateVector object. I have also tried using base.as_Date() to convert date strings into R dates and feeding those into the R vector objects. I always get the same error message. In R, I would change the scale limits like this:

gp + scale_x_date(limits = as.Date(c("2000/01/01", "2010/01/01"))

How do I translate this into rpy2 so that my python script will run?

Upvotes: 1

Views: 1025

Answers (1)

Parfait
Parfait

Reputation: 107687

Consider running base R functions like you do in R which you can import as a library in rpy2. FYI - in R sessions base, stats, utils and other built-in libraries are implicitly loaded without library lines.

Datetime Processing

Also, convert Python datetime objects to string with strftime instead of timetuple() to translate easier.

base = importr('base')
...
date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")
...
ggplot2.scale_x_datetime(limits=date_range))

GGPlot Plus Operator

Additionally, the + Python operator is not quite the same as ggplot2's which is really: ggplot2:::`+.gg`. As pointed out in this SO post, How is ggplot2 plus operator defined?, this function conditionally runs add_theme() or add_ggplot() which you need to replicate in Python. Because the above R function is a local namespace not readily available at ggplot2.* calls, use R's utils::getAnywhere("+.gg") to import the function as a user-defined method.

Consequently, you need to convert the + with actual qualified calls for Python's object model. And you can do so with base R's Reduce. So the following in R:

gp <- ggplot(df)
gp <- gp + aes_string(x='Time', y='Data') +
  geom_point() +
  scale_x_datetime(limits=date_range)

Translates equivalently as

gp <- Reduce(ggplot2:::`+.gg`, list(ggplot(df), aes_string(x='Time', y='Data'), 
                                    geom_point(), scale_x_datetime(limits=date_range)))

Or with getAnywhere() after ggplot2 library is loaded in session:

gg_proc <- getAnywhere("+.gg")

gp <- Reduce(gg_proc$objs[[1]], list(ggplot(df), aes_string(x='Time', y='Data'), 
                                     geom_point(), scale_x_datetime(limits=date_range)))

Rpy2

Below is the full code in rpy2. Because you run R objects layered in Python script non-interactively, plots will not show to screen and will need to be saved which can be achieved with ggsave:

import numpy as np
import pandas as pd
import datetime as dt

from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr

# IMPORT R PACKAGES
base = importr('base')
utils = importr('utils')
ggplot2 = importr('ggplot2')

pandas2ri.activate()

# CREATE RANDOM (SEEDED) DATAFRAME WITH TIME SERIES DATA
np.random.seed(6252018)
df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
                   'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
                            dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
                            dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
                            dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
                            dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})

# CONVERT TO POSIXct VECTOR
date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")

# RETRIEVE NEEDED FUNCTION
gg_plot_func = utils.getAnywhere("+.gg")

# PRODUCE PLOT
gp = base.Reduce(gg_plot_func[1][0], base.list(ggplot2.ggplot(df),
                                               ggplot2.aes_string(x='Time', y='Data'),
                                               ggplot2.geom_point(),
                                               ggplot2.scale_x_datetime(limits=date_range)))
# SAVE PLOT TO DISK
ggplot2.ggsave(filename="myPlot.png", plot=gp, device="png", path="/path/to/plot/output")

Output (rendered in Python)

Plot Output

Upvotes: 1

Related Questions