Reputation: 13
I have a data frame in R where the rows are gene names and the columns are gene ontology IDs, so that it looks like this:
Gene V1 V2 V3
Gene 1 GO1 GO2 GO3
Gene 2 GO2
Gene 3 GO2 GO3
I'm trying to rearrange it so that the rows are unique gene ontology IDs, and each gene that matches those IDs is in a separate column in that row:
GO V1 V2 V3
GO1 Gene1
GO2 Gene1 Gene2 Gene 3
GO3 Gene1 Gene3
I looked into reshape2, but it doesn't seem to be useful for this kind of reorganization. Is there a simple way to do this that I'm overlooking?
Thanks for the help!
Upvotes: 1
Views: 82
Reputation: 887118
This can be done with melt/dcast
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), reshape into 'long' format with melt
(from data.table
), remove the blank rows based on 'GO', and dcast
from 'long' to 'wide'
library(data.table)
dcast(melt(setDT(df1), id.var = "Gene", value.name = "GO")[GO != ""],
GO ~ paste0("V", rowid(variable)), value.var = "Gene", fill="")
# GO V1 V2 V3
#1: GO1 Gene 1
#2: GO2 Gene 1 Gene 2 Gene 3
#3: GO3 Gene 1 Gene 3
Upvotes: 2