K_081
K_081

Reputation: 35

How to set y-axis title and change the font style in UpsetR plot?

I am trying to plot an intersection graph using UpSetR for an Orthogroup gene count dataset that looks like this -

enter image description here

I need to highlight certain intersections, the names 'Mdong', 'Mfastidious' etc, need to be in italics as they are names of bacteria and the y-axis title has to be changed to 'Number of genes shared' . I used the following code and was able to get the intersections but I don't know how to include the italics and y-axis title. I tried including ggplot2 as well but it did not work.

library(UpSetR)
library(ggplot2)

mydataframe <- data.frame(Orthogroup = c("OG0000000", "OG0000001", "OG0000002", "OG0000003", "OG0000004", "OG0000005", "OG0000006", "OG0000007", "OG0000008", "OG0000009", "OG0000010", "OG0000011", "OG0000012"), Mdong = c(9,3,7,0,3,0,5,3,8,14,0,4,6), Midriensis = c(6,8,9,0,6,0,5,6,6,4,0,8,9), Mcrassostreae = c(4,7,3,5,11,3,9,6,10,3,0,4,5))

selected_species <- colnames(mydataframe)[2:(ncol(mydataframe))]

All columns have to be integers, so -

mydataframe[mydataframe > 0] <- 1

Plot -

upset(mydataframe, sets = rev(selected_species), keep.order = T, order.by = "freq", queries=list(list(query = intersects, params = list("Mdong","Midriensis"), color = "red", active = T)), ylab("Number of genes shared"))

This seems to work well for having highlighted intersects but the 'ylab()' is ignored. I also checked examples -R ComplexUpset by krassowski but I guess I don't know how to use UpSetR or ComplexUpset properly. How do I include all this within a single command? Thanks in advance!

Upvotes: 0

Views: 487

Answers (1)

Carl
Carl

Reputation: 7540

The supplied code wouldn't run for me, so I'm not sure if this is what you intended. Nonetheless, it does show how to highlight the names with italics (Mdong in this example) and get a y-axis label. I've used ggupset because it plays nicely with ggplot.

library(tidyverse)
library(ggupset)
# to enable italics
library(ggtext)

mydataframe <-
  data.frame(
    Orthogroup = c(
      "OG0000000",
      "OG0000001",
      "OG0000002",
      "OG0000003",
      "OG0000004",
      "OG0000005",
      "OG0000006",
      "OG0000007",
      "OG0000008",
      "OG0000009",
      "OG0000010",
      "OG0000011",
      "OG0000012"
    ),
    Mdong = c(9, 3, 7, 0, 3, 0, 5, 3, 8, 14, 0, 4, 6),
    Midriensis = c(6, 8, 9, 0, 6, 0, 5, 6, 6, 4, 0, 8, 9),
    Mcrassostreae = c(4, 7, 3, 5, 11, 3, 9, 6, 10, 3, 0, 4, 5)
  )

set_df <- mydataframe |>
  pivot_longer(-Orthogroup) |> 
  distinct(value, name) |> 
  arrange(value, name) |> 
  mutate(
    # select the names to be markdown formatted, e.g. italics
    name = if_else(name %in% c("Mdong"), str_c("*", name, "*"), name),
    gene = list(name), 
    .by = value
    ) |>
  distinct(gene, value)

label_df <- set_df |> count(gene)

set_df |>
  ggplot(aes(gene)) +
  geom_bar() +
  geom_label(aes(y = n, label = n), data = label_df) +
  scale_x_upset() +
  labs(
    # with y-axis label
    x = "Combinations", y = "Number of genes shared",
    title = "Most Frequent Gene Combinations"
  ) +
  # to enable italics
  theme(axis.text.y = element_markdown())

Created on 2024-03-13 with reprex v2.1.0

Upvotes: 1

Related Questions