Reputation: 13
I am trying to use the rpy2 package to call ggplot2 from a python script to plot time series data. I get an error when I try to adjust the date limits of the x-scale. The rpy2 documentation provides this guidance (https://rpy2.readthedocs.io/en/version_2.8.x/vector.html?highlight=date%20vector): "Sequences of date or time points can be stored in POSIXlt
or POSIXct
objects. Both can be created from Python sequences of time.struct_time
objects or from R objects."
Here is my example code:
import numpy as np
import pandas as pd
import datetime as dt
from rpy2 import robjects as ro
from rpy2.robjects import pandas2ri
import rpy2.robjects.lib.ggplot2 as ggplot2
pandas2ri.activate()
#Create a random dataframe with time series data
df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})
#Create a POSIXct vector from time.struct_time objects to store the x limits
date_min = dt.datetime(2000, 1, 1).timetuple()
date_max = dt.datetime(2010, 1, 1).timetuple()
date_range = ro.vectors.POSIXct((date_min, date_max))
#Generate the plot
gp = ggplot2.ggplot(df)
gp = (gp + ggplot2.aes_string(x='Time', y='Data') +
ggplot2.geom_point() +
ggplot2.scale_x_date(limits=date_range))
When I run this code, I get the following error message:
Error: Invalid input: date_trans works with objects of class Date only
Instead of the POSIXct
object, I have also tried the DateVector
object. I have also tried using base.as_Date()
to convert date strings into R dates and feeding those into the R vector objects. I always get the same error message. In R, I would change the scale limits like this:
gp + scale_x_date(limits = as.Date(c("2000/01/01", "2010/01/01"))
How do I translate this into rpy2 so that my python script will run?
Upvotes: 1
Views: 1025
Reputation: 107687
Consider running base R functions like you do in R which you can import as a library in rpy2. FYI - in R sessions base
, stats
, utils
and other built-in libraries are implicitly loaded without library
lines.
Datetime Processing
Also, convert Python datetime objects to string with strftime
instead of timetuple()
to translate easier.
base = importr('base')
...
date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")
...
ggplot2.scale_x_datetime(limits=date_range))
GGPlot Plus Operator
Additionally, the +
Python operator is not quite the same as ggplot2's which is really: ggplot2:::`+.gg`
. As pointed out in this SO post, How is ggplot2 plus operator defined?, this function conditionally runs add_theme()
or add_ggplot()
which you need to replicate in Python. Because the above R function is a local namespace not readily available at ggplot2.*
calls, use R's utils::getAnywhere("+.gg")
to import the function as a user-defined method.
Consequently, you need to convert the +
with actual qualified calls for Python's object model. And you can do so with base R's Reduce
. So the following in R:
gp <- ggplot(df)
gp <- gp + aes_string(x='Time', y='Data') +
geom_point() +
scale_x_datetime(limits=date_range)
Translates equivalently as
gp <- Reduce(ggplot2:::`+.gg`, list(ggplot(df), aes_string(x='Time', y='Data'),
geom_point(), scale_x_datetime(limits=date_range)))
Or with getAnywhere()
after ggplot2 library is loaded in session:
gg_proc <- getAnywhere("+.gg")
gp <- Reduce(gg_proc$objs[[1]], list(ggplot(df), aes_string(x='Time', y='Data'),
geom_point(), scale_x_datetime(limits=date_range)))
Rpy2
Below is the full code in rpy2. Because you run R objects layered in Python script non-interactively, plots will not show to screen and will need to be saved which can be achieved with ggsave
:
import numpy as np
import pandas as pd
import datetime as dt
from rpy2.robjects import pandas2ri
from rpy2.robjects.packages import importr
# IMPORT R PACKAGES
base = importr('base')
utils = importr('utils')
ggplot2 = importr('ggplot2')
pandas2ri.activate()
# CREATE RANDOM (SEEDED) DATAFRAME WITH TIME SERIES DATA
np.random.seed(6252018)
df = pd.DataFrame({'Data': np.random.normal(50, 5, 10),
'Time': [dt.datetime(2000, 7, 23), dt.datetime(2001, 7, 15),
dt.datetime(2002, 7, 30), dt.datetime(2003, 8, 5),
dt.datetime(2004, 6, 28), dt.datetime(2005, 7, 23),
dt.datetime(2006, 7, 15), dt.datetime(2007, 7, 30),
dt.datetime(2008, 8, 5), dt.datetime(2009, 6, 28)]})
# CONVERT TO POSIXct VECTOR
date_min = dt.datetime(2000, 1, 1).strftime('%Y-%m-%d')
date_max = dt.datetime(2010, 1, 1).strftime('%Y-%m-%d')
date_range = base.as_POSIXct(base.c(date_min, date_max), format="%Y-%m-%d")
# RETRIEVE NEEDED FUNCTION
gg_plot_func = utils.getAnywhere("+.gg")
# PRODUCE PLOT
gp = base.Reduce(gg_plot_func[1][0], base.list(ggplot2.ggplot(df),
ggplot2.aes_string(x='Time', y='Data'),
ggplot2.geom_point(),
ggplot2.scale_x_datetime(limits=date_range)))
# SAVE PLOT TO DISK
ggplot2.ggsave(filename="myPlot.png", plot=gp, device="png", path="/path/to/plot/output")
Output (rendered in Python)
Upvotes: 1