Abigail575
Abigail575

Reputation: 175

Y Coordinates of Best Fit Curve in Ggplot2

I have managed to generate pseudotime vs gene expression plots in Monocle for individual markers using the following code:

library("monocle")

lung <- load_lung()
diff_test_res <- differentialGeneTest(
  lung,
  fullModelFormulaStr = "~genotype"
)

ordering_genes <- diff_test_res[diff_test_res$qval < 0.01, "gene_id"]
lung <- setOrderingFilter(lung, ordering_genes)
plot_ordering_genes(lung)
#> Warning: Transformation introduced infinite values in continuous y-axis

lung <- reduceDimension(
  lung,
  max_components = 2,
  method = 'DDRTree'
)
lung <- orderCells(lung)
lung_expressed_genes <-  fData(lung)[fData(lung)$num_cells_expressed >= 5, "gene_id"]
lung_filtered <- lung[lung_expressed_genes, ]
my_genes <- rownames(lung_filtered)[1:3]
lung_subset <- lung_filtered[my_genes, ]

plot_genes_in_pseudotime(lung_subset, color_by = "genotype")

The "plot_genes_in_pseudotime" function on the final line generates a best fit curve of the plotted data. I was wondering if the y coordinates of this curve can somehow be obtained for say, every 0.01 units along the pseudotime axis? You can find the code and example plots here: http://cole-trapnell-lab.github.io/monocle-release/docs/#trajectory-step-3-order-cells-along-the-trajectory

Upvotes: 1

Views: 643

Answers (1)

alan ocallaghan
alan ocallaghan

Reputation: 3038

You can access the Pseudotime and "expectation" values that comprise the curve in plot$data (monocle just plots Pseudotime against spline-smoothed mean expression for the specified genes).

You can then use approxfun to do 2d interpolation and evaluate a grid of points along the range of pseudotime.

NOTE: I am not sure this is a sensible thing to do. Pseudotime is a fairly loose and wooly thing, and reading deeply into minute changes in pseudotime is likely to lead to pretty shaky conclusions.

In any case, if you're interested in using this type of approach I would just read the code on github as it should be fairly easy to reproduce the output.

options(stringsAsFactors = FALSE)
library("monocle")

lung <- load_lung()
#> Removing 4 outliers
diff_test_res <- differentialGeneTest(
  lung,
  fullModelFormulaStr = "~genotype"
)

ordering_genes <- diff_test_res[diff_test_res$qval < 0.01, "gene_id"]
lung <- setOrderingFilter(lung, ordering_genes)
lung <- reduceDimension(
  lung,
  max_components = 2,
  method = 'DDRTree'
)
lung <- orderCells(lung)
lung_expressed_genes <-  fData(lung)[fData(lung)$num_cells_expressed >= 5, "gene_id"]
lung_filtered <- lung[as.character(lung_expressed_genes), ]
my_genes <- rownames(lung_filtered)[1:3]

## Use only 1 gene here. Otherwise the plot data will include multiple genes
lung_subset <- lung_filtered["ENSMUSG00000000031.9", ]

p <- plot_genes_in_pseudotime(lung_subset, color_by = "genotype")
df <- p$data
fun <- approxfun(df$Pseudotime, df$expectation)

s <- seq(min(df$Pseudotime), max(df$Pseudotime), by = 0.01)
plot(s, fun(s))

Upvotes: 1

Related Questions