Batu
Batu

Reputation: 13

How to plot multiple boxplots with numeric x values properly in ggplot2?

I am trying to get a boxplot with 3 different tools in each dataset size like the one below:

enter image description here

ggplot(data1, aes(x = dataset, y = time, color = tool)) + geom_boxplot() + 
  labs(x = 'Datasets', y = 'Seconds', title = 'Time') + 
  scale_y_log10() +  theme_bw()

But I need to transform x-axis to log scale. For that, I need to numericize each dataset to be able to transform them to log scale. Even without transforming them, they look like the one below:

enter image description here

ggplot(data2, aes(x = dataset, y = time, color = tool)) + geom_boxplot() + 
  labs(x = 'Datasets', y = 'Seconds', title = 'Time') + 
  scale_y_log10() + theme_bw()

I checked boxplot parameters and grouping parameters of aes, but could not resolve my problem. At first, I thought this problem is caused by scaling to log, but removing those elements did not resolve the problem.

What am I missing exactly? Thanks...

Files are in this link. "data2" is the numericized version of "data1".

Upvotes: 1

Views: 3220

Answers (1)

RoB
RoB

Reputation: 1984

Your question was a tough cookie, but I learned something new from it!

Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.

This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).

# This is for pretty-printing the axis labels
my_labs <- function(x){
  paste0(x/1000, "k")
}
levs <- unique(data2$dataset)

ggplot(data2, aes(x = dataset, y = time, color = tool, 
                  group = interaction(dataset, tool))) + 
  geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
  scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
  scale_y_log10() + theme_bw()

This plots

enter image description here

Upvotes: 3

Related Questions