PatrickT
PatrickT

Reputation: 10540

plotting annual time series with pretty labels

I have (macroeconomic) annual data from the "Penn World Tables". I'm having problems with the date labels. As you can see below, the dates are expressed as decimals. I've made several attempts to fix it, but failed repeatedly: I turn to you for help.

enter image description here

This occurs, I think, because the "dates" (whole numbers like 2000, 2001, etc.) are treated as numeric rather than as dates. My main problem, therefore, is to fix the date format inside the dataframe for easy plotting.

If pwt indicates the name of my dataframe, and year indicates the column that stores the "dates", this is what I've tried, with no success:

pwt$year <- strptime(pwt$year, format = "%Y")
pwt$year <- as.Date(as.character(pwt$year), format("%Y"), origin = "1970-01-01")
pwt$year <- as.Date(pwt$year, format='%Y-01-01', origin = "1970-01-01")
pwt$year <- as.yearmon(pwt$year) # requires zoo package

Reproducible Code

Let me now present the data. I will show you steps that should recreate the data.

### Define directories
 if(.Platform$OS.type == "windows"){
   currentdir <- "c:/R/pwt"
 } else {
 currentdir <- "~/R/pwt"}
 setwd(currentdir)

# download and save data in current directory
download.file("http://www.rug.nl/research/GGDC/data/pwt/V80/pwt80.xlsx", "pwt80.xlsx", mode="wb")
# **Edit** binary mode "wb" needed!

# convert and save the data sheet in csv format
library(gdata)
installXLSXsupport() # support for xlsx format
DataSheet <- read.xls("pwt80.xlsx", sheet="Data") # load the Data sheet only
write.csv(DataSheet, file=paste("pwt80", "csv", sep="."), row.names=FALSE)

# read pwt80.csv data stored in current directory
pwt80 <- read.csv(paste(currentdir, "pwt80.csv", sep="/"))

# use -subset- to get specifc countries and variables.
countries <- c("ESP", "ITA")
variables <- c("country", "countrycode", "year", "rgdpo", "pop")
pwt <- subset(#
  pwt80
  , countrycode %in% countries
  , select = variables
)#

I am now interested in plotting the GDP per Capita for the above subsample of countries. So here is some code that intends to do that.

# Plot data with qplot
library(ggplot2)
qp <- qplot(#
  year
  , rgdpo/pop
  , data = subset(pwt80, countrycode %in% countries)
  , geom = "line"
  , group = countrycode
  , color = as.factor(countrycode)
)#
qp <- qp + 
  xlab("") + 
  ylab("Real GDP Per Capita (international $, 2005 prices, chain)") + 
  theme(legend.title = element_blank()) + 
  coord_trans(y = "log10")

Dates look okay at this point, but things start to go wrong when I "zoom" with xlim and ylim:

qp <- qp + xlim(2000,2010) + ylim(22000,35000)
qp

The same problem exists if I use ggplot instead of qplot.

# Plot data with ggplot
ggp <- ggplot(pwt,aes(x=year,y=rgdpo/pop,color=as.factor(countrycode),group=countrycode)) + 
  geom_line()  
ggp <- ggp + 
  xlab("") + 
  ylab("Real GDP Per Capita (international $, 2005 prices, chain)") + 
  theme(legend.title = element_blank()) + 
  coord_trans(y = "log10")
ggp

ggp <- ggp + xlim(2000,2010) + ylim(22000,35000)
ggp

EDIT: Removed question related to xts objects. Removed the dput() to shorten question.

Upvotes: 1

Views: 2240

Answers (1)

Didzis Elferts
Didzis Elferts

Reputation: 98589

Variable year is not treated as date because it has only year values. For the date you need also month and day value. In this situation easiest would be to use scale_x_continuous() and set your own breaks=.

Also you mentioned that you want to zoom plot - then you should use coord_cartesian() instead of xlim() as xlim() will drop unused data (date outside the range) from calculation.

qp+coord_cartesian(xlim=c(2000,2010),ylim=c(22000,35000))+
  scale_x_continuous(breaks=seq(2000,2010,2))

If you really need year values as date then you can add to those values some arbitrary month and day values and then convert this to date object.

pwt$year2<-as.Date(paste0(pwt$year,"-01-01"),format="%Y-%m-%d")

If the date object is used for the x axis then in coord_cartesion() for xlim= you should provide also limits as date object. To control x axis formating use scale_x_date().

library(scales)

qp+coord_cartesian(xlim=as.Date(c("2000-01-01","2010-01-01")),ylim=c(22000,35000))+
  scale_x_date(breaks=date_breaks("2 years"),labels=date_format("%Y"))

Upvotes: 2

Related Questions