haggis
haggis

Reputation: 417

x-axis in reverse order

I've used the following way to create 3 histograms. The 4th one has suddenly a reverse order on the x-axis. However, there's nothing (at least nothing I know about) in the snippet that should affect the order.

The x-axis is expected to start with the lowest value on the left

The x-axis is expected to start with the lowest value on the left.

Here's the R code:

df <- mydata %>% mutate(length.class=cut(mydata$count,breaks = c(1,10,100,1000,10000,100000,1000000,10000000),include.lowest=TRUE,dig.lab=8)) %>% group_by(length.class) %>% summarise(count = n())
dftext <- as.data.frame(table(df$length.class))
colnames(dftext)[1] <- "x"
dftext$lab[dftext$x == "[1,10]"] <- 1063393
dftext$lab[dftext$x == "(10,100]"] <- 65986
dftext$lab[dftext$x == "(100,1000]"] <- 3206
dftext$lab[dftext$x == "(1000,10000]"] <- 386
dftext$lab[dftext$x == "(10000,100000]"] <- 32
dftext$lab[dftext$x == "(100000,1000000]"] <- 0
dftext$lab[dftext$x == "(1000000,10000000]"] <- 1

df$count[df$length.class == "(1000000,10000000]"] <- 1.1  // To make its bar visible

fmt <- function(decimals=0){
    function(x) format(x,scientific = FALSE)
}

ggplot(df,aes(length.class,count)) + geom_bar(stat = "identity",width=0.9,fill="#999966") + scale_y_log10(labels = fmt()) + labs(x="", y="") + geom_text(data=dftext, aes(x=x, y=2, label=lab), size = 6) + theme(text = element_text(size=20)) +
    theme(axis.line = element_line(colour = "black"),
          panel.grid.major = element_line(color = "grey"),
          panel.grid.minor = element_line(color = "grey"),
          panel.background = element_blank(),
          axis.title.x = element_text(margin=margin(t = 15, unit = "pt")),
          axis.text.x = element_text(angle = 45, hjust = 1))

What is causing the reverse order and how can I get rid of it?

Edit: You guys are fast! :) The answer of @mark-peterson looks pretty solid, however I didn't get any working results with it though. Here's the requested data: mydata.csv

Upvotes: 1

Views: 2781

Answers (2)

Mark Peterson
Mark Peterson

Reputation: 9560

When given text labels, geom_bar converts to a factor and sorts the bars. My guess it that alphabetical and numerical matched up for your previous uses, but did not for this one. I thought that @Pierre was right about scale_x_reverse(), but it doesn't appear to work on factors. Instead, you will need to set the factor orders yourself. Without sample data, it is hard to help do that.

A better question, however, is why you are doing so much work by hand here. The tools exist to automate much of your set up, with the added benefit of reducing errors and sorting the factor correctly. For example, with some reproducible data:

temp <- data.frame(a = 1:999)

temp$binned <-
  cut(temp$a, 10^(0:3), include.lowest = TRUE)

forText <-
  table(temp$binned) %>%
  as.data.frame()

ggplot(temp, aes(x = binned)) +
  geom_bar() +
  geom_text(data = forText
            , aes(x = Var1
                  , y = 75
                  , label = Freq))

enter image description here

If you just want a picture of the distribution, you can be even faster with a histogram:

ggplot(temp, aes(a)) +
  geom_histogram() +
  scale_x_log10()

enter image description here

(Also, in the future, try to strip down to an MWE -- no need to include lots of theme settings if they are not germane to the problem.)

Using the posted data, I got the plot to work with my approach above. Note that you would need to add the additional theme and scale arguments. You also need to make use of @aosmith's answer about the missing value. (Which, I think, means that @aosmith's answer actually answers your question, while mine may be just good advice for how to do this more quickly.)

mydata$binned <-
  cut(mydata$count,breaks = c(1,10,100,1000,10000,100000,1000000,10000000),include.lowest=TRUE,dig.lab=8)

forText <-
  table(mydata$binned) %>%
  as.data.frame()

ggplot(mydata, aes(x = binned)) +
  geom_bar() +
  geom_text(data = forText
            , aes(x = Var1
                  , y = 75
                  , label = Freq)) +
  scale_x_discrete(drop = FALSE)

Upvotes: 1

aosmith
aosmith

Reputation: 36076

Your two datasets have the same levels of the factors length.class and x, but there is no row for (100000,1000000] in your first dataset, df. This is because summarise has no drop = FALSE option to keep all levels of a factor in the dataset regardless of if they have any observations.

As you built your plot using the dataset with fewer factors in the rows, it looks like ggplot2 gets confused when you add the new layer that has more factor levels and things get ordered oddly.

A fix is to make sure the x axis doesn't drop any factor levels by using drop = FALSE in scale_x_discrete. That way you will be working with the same factor levels for the x axis for both datasets and things won't get mis-ordered.

+ scale_x_discrete(drop = FALSE)

Upvotes: 3

Related Questions