Reshape data in R

Question

I want to have a matrix from this data frame. The values should be on the basis if there is a relation between a pair of gene then 1, and if not then 0. So ADRA1D and ADK would have value 1, and so would other pairs. But there is no pair of ADK and AR so in that matrix it should be 0.

tab <- read.table(text="ID  gene1   gene2
1   ADRA1D  ADK
2   ADRA1B  ADK
3   ADRA1A  ADK
4   ADRB1   ASIC1
5   ADRB1   ADK
6   ADRB2   ASIC1
7   ADRB2   ADK
8   AGTR1   ACHE
9   AGTR1   ADK
10  ALOX5   ADRB1
11  ALOX5   ADRB2
12  ALPPL2  ADRB1 
13  ALPPL2  ADRB2
14  AMY2A   AGTR1
15  AR  ADORA1
16  AR  ADRA1D
17  AR  ADRA1B
18  AR  ADRA1A
19  AR  ADRA2A
20  AR  ADRA2B", header=TRUE, stringsAsFactors=FALSE)

Primarily, I want to build a phylogenetic tree, so was thinking of having a matrix like that. How can I use reshape library for this, since I have no value column?

The below code does not work:

library(reshape)
ct=cast(tab,gene1~gene2)

Ram Narasimhan · Accepted Answer

If it is not mandatory to use reshape I'd suggest taking a look at igraph. Here's one way to get the symmetrical matrix using the igraph package. We first convert your data frame (the relevant 2 columns) into an igraph object, and then get_adjacency does the needful.

library(igraph)
g <- graph.data.frame(tab[,c(2,3)])
get.adjacency(g)

This gives you the adjacency matrix. You should definitely look into using igraph for the rest of your analysis.

16 x 16 sparse Matrix of class "dgCMatrix"
   [[ suppressing 16 column names ‘ADRA1D’, ‘ADRA1B’, ‘ADRA1A’ ... ]]

ADRA1D . . . . . . . . . . 1 . . . . .
ADRA1B . . . . . . . . . . 1 . . . . .
ADRA1A . . . . . . . . . . 1 . . . . .
ADRB1  . . . . . . . . . . 1 1 . . . .
ADRB2  . . . . . . . . . . 1 1 . . . .
AGTR1  . . . . . . . . . . 1 . 1 . . .
ALOX5  . . . 1 1 . . . . . . . . . . .
ALPPL2 . . . 1 1 . . . . . . . . . . .
AMY2A  . . . . . 1 . . . . . . . . . .
AR     1 1 1 . . . . . . . . . . 1 1 1
ADK    . . . . . . . . . . . . . . . .
ASIC1  . . . . . . . . . . . . . . . .
ACHE   . . . . . . . . . . . . . . . .
ADORA1 . . . . . . . . . . . . . . . .
ADRA2A . . . . . . . . . . . . . . . .
ADRA2B . . . . . . . . . . . . . . . .

An advantage of using igraph is that many graph-based distance calculation methods are now available for you. Do look into shortest.paths

Reshape data in R

Answers (2)

Related Questions