Reputation: 63
I am trying to be as specific as possible. The data I am working with looks like:
dates bsheet mro ciss
1 2008 Oct 490509 3.751000 0.8579982
2 2008 Nov 513787 3.434333 0.9153926
3 2008 Dec 570591 2.718742 0.9145012
4 2009 Jan 534985 2.323581 0.8811410
5 2009 Feb 528390 2.001000 0.8551557
6 2009 Mar 551730 1.662290 0.8286146
7 2009 Apr 514041 1.309333 0.7460113
8 2009 May 486151 1.097774 0.5925725
9 2009 Jun 484629 1.001000 0.5412631
10 2009 Jul 454379 1.001000 0.5398128
11 2009 Aug 458111 1.001000 0.3946989
12 2009 Sep 479956 1.001000 0.2232348
13 2009 Oct 448080 1.001000 0.2961637
14 2009 Nov 427756 1.001000 0.3871220
15 2009 Dec 448548 1.001000 0.3209175
and can be produced via
structure(list(dates = c("2008 Oct", "2008 Nov", "2008 Dec",
"2009 Jan", "2009 Feb", "2009 Mar", "2009 Apr", "2009 May", "2009 Jun",
"2009 Jul", "2009 Aug", "2009 Sep", "2009 Oct", "2009 Nov", "2009 Dec"
), bsheet = c(490509, 513787, 570591, 534985, 528390, 551730,
514041, 486151, 484629, 454379, 458111, 479956, 448080, 427756,
448548), mro = c(3.751, 3.43433333333333, 2.71874193548387, 2.32358064516129,
2.001, 1.66229032258065, 1.30933333333333, 1.09777419354839,
1.001, 1.001, 1.001, 1.001, 1.001, 1.001, 1.001), ciss = c(0.857998173913043,
0.9153926, 0.914501173913044, 0.881140954545454, 0.85515565,
0.828614636363636, 0.746011318181818, 0.592572476190476, 0.541263136363636,
0.539812782608696, 0.394698857142857, 0.223234772727273, 0.296163727272727,
0.387122047619048, 0.32091752173913)), row.names = c(NA, 15L), class = "data.frame")
The line chart I created using the following code
ciss_plot = ggplot(data = example) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_x_discrete(breaks = dates_breaks, labels = dates_labels) +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) +
theme_bw() +
theme(axis.text.x = element_text(hjust = c(rep(0.5, 11), 0.8, 0.2)))
ciss_plot
for ggplot2 looks like:
whereas if plot the same data using the standard built in plot() function of R using
plot(example$ciss, type = 'l')
results in
which obviously is NOT identical!
Could someone please help me out? These plots take me forever already and I am not figuring out where the problem is. I suspect something is wring either with "group = 1" or the data type of the example$dates column!
I am thankful for any constructive input!!
Thank you all in advance!
Manuel
Upvotes: 0
Views: 623
Reputation: 173793
Your date
column is in character format. This means that ggplot
will by default convert it to a factor and arrange it in alphabetical order, which is why the plot appears in a different shape. One way to fix this is to ensure you have the levels in the correct order before plotting, like this:
library(dplyr)
library(ggplot2)
dates_breaks <- as.character(example$dates)
ggplot(data = example %>% mutate(dates = factor(dates, levels = dates))) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_x_discrete(breaks = dates_breaks, labels = dates_breaks,
guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
expand = c(0, 0)) +
theme_bw()
A smarter way would be to convert the date column to actual date times, which allows greater freedom of plotting and prevents you having to use a grouping variable at all:
example <- example %>%
mutate(dates = as.POSIXct(strptime(paste(dates, "01"), "%Y %b %d")))
ggplot(example) +
geom_line(aes(x = dates, y = ciss, group = 1)) +
labs(x = 'Time', y = 'CISS') +
scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
expand = c(0, 0)) +
scale_x_datetime(breaks = seq(min(example$dates), max(example$dates), "year"),
labels = function(x) strftime(x, "%Y\n%b")) +
theme_bw() +
theme(panel.grid.minor.x = element_blank())
Upvotes: 1