Reputation: 449
I want to have a matrix from this data frame. The values should be on the basis if there is a relation between a pair of gene then 1, and if not then 0. So ADRA1D and ADK would have value 1, and so would other pairs. But there is no pair of ADK and AR so in that matrix it should be 0.
tab <- read.table(text="ID gene1 gene2
1 ADRA1D ADK
2 ADRA1B ADK
3 ADRA1A ADK
4 ADRB1 ASIC1
5 ADRB1 ADK
6 ADRB2 ASIC1
7 ADRB2 ADK
8 AGTR1 ACHE
9 AGTR1 ADK
10 ALOX5 ADRB1
11 ALOX5 ADRB2
12 ALPPL2 ADRB1
13 ALPPL2 ADRB2
14 AMY2A AGTR1
15 AR ADORA1
16 AR ADRA1D
17 AR ADRA1B
18 AR ADRA1A
19 AR ADRA2A
20 AR ADRA2B", header=TRUE, stringsAsFactors=FALSE)
Primarily, I want to build a phylogenetic tree, so was thinking of having a matrix like that. How can I use reshape library for this, since I have no value column?
The below code does not work:
library(reshape)
ct=cast(tab,gene1~gene2)
Upvotes: 0
Views: 253
Reputation: 22496
If it is not mandatory to use reshape
I'd suggest taking a look at igraph.
Here's one way to get the symmetrical matrix using the igraph
package. We first convert your data frame (the relevant 2 columns) into an igraph
object, and then get_adjacency
does the needful.
library(igraph)
g <- graph.data.frame(tab[,c(2,3)])
get.adjacency(g)
This gives you the adjacency matrix. You should definitely look into using igraph for the rest of your analysis.
16 x 16 sparse Matrix of class "dgCMatrix"
[[ suppressing 16 column names ‘ADRA1D’, ‘ADRA1B’, ‘ADRA1A’ ... ]]
ADRA1D . . . . . . . . . . 1 . . . . .
ADRA1B . . . . . . . . . . 1 . . . . .
ADRA1A . . . . . . . . . . 1 . . . . .
ADRB1 . . . . . . . . . . 1 1 . . . .
ADRB2 . . . . . . . . . . 1 1 . . . .
AGTR1 . . . . . . . . . . 1 . 1 . . .
ALOX5 . . . 1 1 . . . . . . . . . . .
ALPPL2 . . . 1 1 . . . . . . . . . . .
AMY2A . . . . . 1 . . . . . . . . . .
AR 1 1 1 . . . . . . . . . . 1 1 1
ADK . . . . . . . . . . . . . . . .
ASIC1 . . . . . . . . . . . . . . . .
ACHE . . . . . . . . . . . . . . . .
ADORA1 . . . . . . . . . . . . . . . .
ADRA2A . . . . . . . . . . . . . . . .
ADRA2B . . . . . . . . . . . . . . . .
An advantage of using igraph
is that many graph-based distance calculation methods are now available for you. Do look into shortest.paths
Upvotes: 2
Reputation: 7830
You can achieve this with the table
function :
> table(tab$gene1, tab$gene2)
ACHE ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B ADRB1 ADRB2 AGTR1 ASIC1
ADRA1A 0 1 0 0 0 0 0 0 0 0 0 0
ADRA1B 0 1 0 0 0 0 0 0 0 0 0 0
ADRA1D 0 1 0 0 0 0 0 0 0 0 0 0
ADRB1 0 1 0 0 0 0 0 0 0 0 0 1
ADRB2 0 1 0 0 0 0 0 0 0 0 0 1
AGTR1 1 1 0 0 0 0 0 0 0 0 0 0
ALOX5 0 0 0 0 0 0 0 0 1 1 0 0
ALPPL2 0 0 0 0 0 0 0 0 1 1 0 0
AMY2A 0 0 0 0 0 0 0 0 0 0 1 0
AR 0 0 1 1 1 1 1 1 0 0 0 0
Use as.matrix
if you want a matrix structure.
EDIT ## : For a symetric matrix.
To generate a symetric matrix when you use table
you need that the two arguments have the same levels, here the values aren't factors but strings then there is no levels but it's the same thing. You need at least one occurence of each unique gene1 in gene2 and vice versa.
For that I suggest you to create a vector with all your genes (I used sort(unique(c(unique(tab$gene1), unique(tab$gene2))))
).
I merged "gene1" with this vector keeping all the occurences with no correspondances, it will produces NA instead of join with something. Same thing for "gene2".
Now you have all at least one of each gene possible in "gene1" and "gene2" and you can table
.
genes <- c('ACHE','ADK','ADORA1','ADRA1A','ADRA1B','ADRA1D','ADRA2A','ADRA2B','ADRB1','ADRB2','AGTR1','ALOX5','ALPPL2','AMY2A','AR','ASIC1')
df <- merge(tab, as.data.frame(genes), by.x = "gene1", by.y = "genes", all = TRUE)
df <- merge(df, as.data.frame(genes), by.x = "gene2", by.y = "genes", all = TRUE)
> table(df$gene1, df$gene2)
ACHE ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B ADRB1 ADRB2 AGTR1 ALOX5 ALPPL2 AMY2A AR ASIC1
ACHE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADK 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADORA1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRA1A 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRA1B 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRA1D 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRA2A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRA2B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ADRB1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
ADRB2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AGTR1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ALOX5 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
ALPPL2 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
AMY2A 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
AR 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0
ASIC1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hope this help, this is probably not the best way to do it though.
Upvotes: 1