teunbrand
teunbrand

Reputation: 38043

GRanges as column in base::data.frame

I would like to store a GenomicRanges::GRanges object from Bioconductor as a single column in a base R data.frame. The reason I'd like to have it in a base R data.frame is because I'd like to write some ggplot2 functions that exclusively work with data.frames under the hood. However, any attempts I made don't seem to be fruitful. Basically this is what I want to do:

library(GenomicRanges)

x <- GRanges(c("chr1:100-200", "chr1:200-300"))

df <- data.frame(x = x, y = 1:2)

But the column is automatically expanded, whereas I like to keep it as a valid GRanges object in a single column:

> df
  x.seqnames x.start x.end x.width x.strand y
1       chr1     100   200     101        * 1
2       chr1     200   300     101        * 2

When I work with the S4Vectors::DataFrame, it works as I want, except I'd like a base R data.frame to do the same thing:

> S4Vectors::DataFrame(x = x, y = 1:2)
DataFrame with 2 rows and 2 columns
             x         y
     <GRanges> <integer>
1 chr1:100-200         1
2 chr1:200-300         2

I also tried the following without succes:

> df <- data.frame(y = 1:2)
> df[["x"]] <- x
> df
  y                                                           x
1 1 <S4 class ‘GRanges’ [package “GenomicRanges”] with 7 slots>
2 2                                                        <NA>

Warning message: In format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x, : corrupt data frame: columns will be truncated or padded with NAs

df[["x"]] <- I(x)

Error in rep(value, length.out = nrows) : attempt to replicate an object of type 'S4'

I had some minor succes with implementing an S3 variant of the GRanges class using vctrs::new_rcrd, but that seems to be a very roundabout way to get a single column representing a genomic range.

Upvotes: 2

Views: 4437

Answers (3)

Z. Zhang
Z. Zhang

Reputation: 757

I found a very simple way to convert an GR object to a dataframe so that you can operate on the data.frame very easily. The annoGR2DF function in the Repitools package can do so.

> library(GenomicRanges)
> library(Repitools)
> 
> x <- GRanges(c("chr1:100-200", "chr1:200-300"))
> 
> df <- annoGR2DF(x)
> df
   chr start end width
1 chr1   100 200   101
2 chr1   200 300   101
> class(df)
[1] "data.frame"

Upvotes: 4

teunbrand
teunbrand

Reputation: 38043

So since posting this question, I figured out that the crux of my problem seemed to be that just the format method of S4 objects is not playing nicely with the data.frames, and having GRanges as columns isn't necessarily a problem. (The construction of the data.frame still is though).

Consider this bit of the original question:

> df <- data.frame(y = 1:2)
> df[["x"]] <- x
> df
  y                                                           x
1 1 <S4 class ‘GRanges’ [package “GenomicRanges”] with 7 slots>
2 2   

Warning message: In format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x, : corrupt data frame: columns will be truncated or padded with NAs

If we write a simple format method for GRanges, it will not throw an error:

library(GenomicRanges)

format.GRanges <- function(x, ...) {showAsCell(x)}

df <- data.frame(y = 1:3)

df$x <- GRanges(c("chr1:100-200", "chr1:200-300", "chr2:100-200"))
> df
  y            x
1 1 chr1:100-200
2 2 chr1:200-300
3 3 chr2:100-200

It seems to subset just fine too:

> df[c(1,3),]
  y            x
1 1 chr1:100-200
3 3 chr2:100-200

As a bonus, this seems to work for other S4 classes too, for example:

library(S4Vectors)

format.Rle <- function(x, ...) {showAsCell(x)}

x <- Rle(1:5, 5:1)

df <- data.frame(y = 1:15)
df$x <- x

Upvotes: 0

Angel Garcia Campos
Angel Garcia Campos

Reputation: 138

A not pretty but practical solution is to use the accessor functions of GenomicRanges, then convert to the relevant data vector, i.e. numeric or character. I added magrittr, but you can also do it without it.

library(GenomicRanges)
library(magrittr)

x <- GRanges(c("chr1:100-200", "chr1:200-300"))
df <- data.frame(y = 1:2)
df$chr <- seqnames(x) %>% as.character
df$start <- start(x) %>% as.numeric
df$end <- end(x) %>% as.numeric
df$strand <- strand(x) %>% as.character
df$width <- width(x) %>% as.numeric
df

Upvotes: 0

Related Questions