Brad
Brad

Reputation: 51

R - Make a scatterplot look like a dotplot with counts above and below zero

I'm just learning with R so there's probably an easier way to do this. I have a table of data that shows a set of stores with their change in market share over the same period a year ago. I've included a link to the first two periods worth of data.

Sample Data

I currently have a scatterplot that looks like this

enter image description here

Each vertical is a four-week period and each store is represented by a point based on their rank (positive or negative) within the gainers and decliners. This is close to what I'm looking for, but the spacing is all off and the datapoints blend into each other. I am trying to build something that looks more like this:

Example

Basically something that looks more like a dotplot, but has counts above and below the line. The scatterplot doesn't seem the way to go, but I can't see how to make a dotplot that will show my winners above the line and my losers below the line so that the zero line remains consistent across. Here's the code I'm using for the scatterplot:

sp1 <- ggplot(store_change_ranked, aes(x=date, y=rank)) +
        geom_point(aes(color = cut(share_chg_yag, c(-Inf, -.1, -.05, -.025, -.015, 0, .015, .025, .05, .1, Inf)))) +
        scale_color_manual(name = "Share Change",
                           values = c("(-Inf,-0.1]" = "red4",
                                      "(-0.1,-0.05]" = "red",
                                      "(-0.05,-0.025]" = "orangered",
                                      "(-0.025,-0.015]" = "darkorange2",
                                      "(-0.015,0]" = "darkorange",
                                      "(0,0.015]" = "greenyellow",
                                      "(0.015,0.025]" = "lightgreen",
                                      "(0.025,0.05]" = "green",
                                      "(0.05,0.1]" = "green2",
                                      "(0.1, Inf]" = "green4"),
                           labels = c("< -10%", " ", "-2.5% to -5.0% ", " ", "0 to -1.5%", "0 to 1.5%", " ", "2.5% to 5.0% ", " ", "10% +")) +
        labs(x = "4-Week Period", title = "Count of Stores Gaining/Losing Share",
             subtitle = "For the 13 periods ending June 2018", y = "# Stores")+
        scale_x_date(date_breaks = "1 month", date_labels = "%m-%y")+
        theme(legend.position = "right", axis.text.y = element_blank(),panel.background=element_blank(),
              panel.grid.major=element_blank(),
              panel.grid.minor=element_blank())

Any help would be appreciated.

Thanks!

Upvotes: 1

Views: 809

Answers (1)

ogustavo
ogustavo

Reputation: 586

As others have pointed out in the comments your plot is not reproducible. That being said, I can't pinpoint exactly what your problem is, but I think that if you follow what I did you'll be able to plot your data the way you want.

I've simulated some data for my plot, so it won't look exactly like the one from the second picture, but it gives the same idea. Also, since I don't know what the black and gray points are I skipped them.

This is the plot I came up with: enter image description here

And this is the code for it:

# **************************************************************************** #
# Simulate Data                                                             ---- 
# **************************************************************************** #

set.seed(123)

create_data <- function(year, month, sector.rising, 
                        max.percent, max.number.sectors) {

  reps <- sum(max.number.sectors, 1)

  if(sector.rising == 1){
    multiplier <- 1
  } else multiplier <- -1

  tmp <- data.frame(
    Year.Month = factor(rep(paste0(year,",", month),reps)),
    Sector = rep(sector.rising,reps),
    Sector.Count = multiplier*seq(0, max.number.sectors),
    Percent = multiplier*sort(runif(reps,min =0, max = max.percent))
  )

  return(tmp)
}


df.tmp <- NULL

for (k.sector in 1:2){

  for (i.year in 2006:2016){
    for (j.month in 1:12) {

      if (k.sector == 1) { # 1 for rising, 2 for falling
        ran.percent <- runif(1,0,1)
      } else ran.percent <- runif(1,0,1.25)

      ran.number.sectors <- rbinom(1, 20, 0.5)

      tmp <- create_data(year = i.year,
                         month = j.month,
                         sector.rising = k.sector, 
                         max.percent = ran.percent,
                         max.number.sectors = ran.number.sectors
      )

      df.tmp <- rbind(df.tmp, tmp)

    }
  }

}

# **************************************************************************** #
# Plot                                                                      ---- 
# **************************************************************************** #

p <- ggplot(
      data = df.tmp,
      aes(x=Year.Month,
          y=Sector.Count, 
          color = cut(Percent, breaks = seq(-1.25,1,.25),include.lowest = T)
      )
    ) + 
    geom_point(
      size=2,
      alpha = 1,
      pch = 19
    ) +
    scale_x_discrete(
      position = "top",
      breaks = c("2007,1","2008,1","2009,1","2010,1","2011,1",
                 "2012,1","2013,1","2014,1","2015,1"
      ),
      labels = c("2007","2008","2009","2010","2011",
                 "2012","2013","2014","2015"
      ),
      name = ""
      ) +
    scale_y_continuous(
      limits = c(-20,20),
      breaks = seq(-20,20,5),
      labels = as.character(seq(-20,20,5)),
      name = "< SECTORS FALLING       SECTORS RISING >",
      expand = c(0,0)
      ) + 
    scale_color_manual(
      values = c("#d53e4f","#f46d43","#fdae61","#fee08b",
                 "#ffffbf","#e6f598","#abdda4","#66c2a5","#3288bd"),
      name = "", 
      drop = FALSE,
      labels = c("     ",
                 "-1%   ",
                 "     ",
                 "     ",
                 "     ",
                 "0%   ",
                 "     ",
                 "     ",
                 ".75% "),
      guide = guide_legend(
        direction = "horizontal",
        keyheight = unit(2, units = "mm"),
        keywidth = unit(2, units = "mm"),
        nrow = 1,
        byrow = T,
        reverse = F,
        label.position = "bottom",
        override.aes=list(shape=15, cex = 7),
        label.hjust = -0.4,
        title.hjust = 0.5
      )
    ) +
    theme(
      text = element_text(size = 10, color = "#4e4d47"),
      panel.background = element_blank(),
      legend.key.size = unit(1,"mm"),
      legend.position = "top",
      axis.title = element_text(size = 8, color = "#4e4d47"),
      legend.text = element_text(size = 6, color = "#4e4d47")
    )

p 

Upvotes: 1

Related Questions