m0byn
m0byn

Reputation: 63

ggplot line chart does not show data correctly

I am trying to be as specific as possible. The data I am working with looks like:

      dates bsheet      mro      ciss
1  2008 Oct 490509 3.751000 0.8579982
2  2008 Nov 513787 3.434333 0.9153926
3  2008 Dec 570591 2.718742 0.9145012
4  2009 Jan 534985 2.323581 0.8811410
5  2009 Feb 528390 2.001000 0.8551557
6  2009 Mar 551730 1.662290 0.8286146
7  2009 Apr 514041 1.309333 0.7460113
8  2009 May 486151 1.097774 0.5925725
9  2009 Jun 484629 1.001000 0.5412631
10 2009 Jul 454379 1.001000 0.5398128
11 2009 Aug 458111 1.001000 0.3946989
12 2009 Sep 479956 1.001000 0.2232348
13 2009 Oct 448080 1.001000 0.2961637
14 2009 Nov 427756 1.001000 0.3871220
15 2009 Dec 448548 1.001000 0.3209175

and can be produced via

structure(list(dates = c("2008 Oct", "2008 Nov", "2008 Dec", 
"2009 Jan", "2009 Feb", "2009 Mar", "2009 Apr", "2009 May", "2009 Jun", 
"2009 Jul", "2009 Aug", "2009 Sep", "2009 Oct", "2009 Nov", "2009 Dec"
), bsheet = c(490509, 513787, 570591, 534985, 528390, 551730, 
514041, 486151, 484629, 454379, 458111, 479956, 448080, 427756, 
448548), mro = c(3.751, 3.43433333333333, 2.71874193548387, 2.32358064516129, 
2.001, 1.66229032258065, 1.30933333333333, 1.09777419354839, 
1.001, 1.001, 1.001, 1.001, 1.001, 1.001, 1.001), ciss = c(0.857998173913043, 
0.9153926, 0.914501173913044, 0.881140954545454, 0.85515565, 
0.828614636363636, 0.746011318181818, 0.592572476190476, 0.541263136363636, 
0.539812782608696, 0.394698857142857, 0.223234772727273, 0.296163727272727, 
0.387122047619048, 0.32091752173913)), row.names = c(NA, 15L), class = "data.frame")

The line chart I created using the following code

  ciss_plot = ggplot(data = example) + 
    geom_line(aes(x = dates, y = ciss, group = 1)) +
    labs(x = 'Time', y = 'CISS') + 
    scale_x_discrete(breaks = dates_breaks, labels = dates_labels) +
    scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) +
    theme_bw() +
    theme(axis.text.x = element_text(hjust = c(rep(0.5, 11), 0.8, 0.2))) 
  ciss_plot

for ggplot2 looks like:

Plot created using ggplot2

whereas if plot the same data using the standard built in plot() function of R using

  plot(example$ciss, type = 'l')

results in

Plot created using default R function

which obviously is NOT identical!

Could someone please help me out? These plots take me forever already and I am not figuring out where the problem is. I suspect something is wring either with "group = 1" or the data type of the example$dates column!

I am thankful for any constructive input!!

Thank you all in advance!

Manuel

Upvotes: 0

Views: 623

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173793

Your date column is in character format. This means that ggplot will by default convert it to a factor and arrange it in alphabetical order, which is why the plot appears in a different shape. One way to fix this is to ensure you have the levels in the correct order before plotting, like this:

library(dplyr)
library(ggplot2)

dates_breaks <- as.character(example$dates)

ggplot(data = example %>% mutate(dates = factor(dates, levels = dates))) + 
    geom_line(aes(x = dates, y = ciss, group = 1)) +
    labs(x = 'Time', y = 'CISS') + 
    scale_x_discrete(breaks = dates_breaks, labels = dates_breaks,
                     guide = guide_axis(n.dodge = 2)) +
    scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
                       expand = c(0, 0)) +
    theme_bw()

enter image description here

A smarter way would be to convert the date column to actual date times, which allows greater freedom of plotting and prevents you having to use a grouping variable at all:

example <- example %>%
  mutate(dates = as.POSIXct(strptime(paste(dates, "01"), "%Y %b %d")))
  
ggplot(example) + 
  geom_line(aes(x = dates, y = ciss, group = 1)) +
  labs(x = 'Time', y = 'CISS') + 
  scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)),
                     expand = c(0, 0)) +
  scale_x_datetime(breaks = seq(min(example$dates), max(example$dates), "year"),
                   labels = function(x) strftime(x, "%Y\n%b")) +
  theme_bw() +
  theme(panel.grid.minor.x = element_blank())

enter image description here

Upvotes: 1

Related Questions