Reputation: 3
I have a data file representing the presence of genes in respective strains like this:
|gene name| strains|
|BCAL0113| MS2140|
|BCAL0113| VC9970|
|BCAL0113| VC9872|
|BCAL0113| VC9842|
|BCAL0113| VC9789|
|BCAL0113| VC9670|
|BCAL0113| VC9612|
|BCAL0114| VC9444|
|BCAL0114| VC8412|
|BCAL0114| VC8319|
|BCAL0114| VC7880|
|BCAL0115| VC7879|
|BCAL0115| VC7723|
|BCAL0116| VC7722|
|BCAL0116| VC7718|
I want to create a matrix that shows gene_names as first column (column_names) and strains as first row (row_names) with counts as 1 or 0 for presence or absence respectively. i need matrix table like this:
MS2140 VC9970 VC9872 VC9842 VC8319 VC7880 VC7879 VC7723
BCAL0113 1 1 1 1 0 0 0 0 0 0
BCAL0114 0 0 0 0 1 0 0 0 0 0
BCAL0115 0 0 0 0 0 1 1 0 0 0
BCAL0116 0 0 0 0 0 0 0 0 1 1
I want to create a matrix of the presence or absence of genes in respective strains.
Upvotes: 0
Views: 89
Reputation: 582
m1 <- t(matrix(c('BCAL0113', 'MS2140', 'BCAL0113', 'VC9970', 'BCAL0113', 'VC9872', 'BCAL0113', 'VC9842', 'BCAL0113', 'VC9789', 'BCAL0113', 'VC9670', 'BCAL0113', 'VC9612', 'BCAL0114', 'VC9444', 'BCAL0114', 'VC8412', 'BCAL0114', 'VC8319', 'BCAL0114', 'VC7880', 'BCAL0115', 'VC7879', 'BCAL0115', 'VC7723', 'BCAL0116', 'VC7722', 'BCAL0116', 'VC7718'), nrow=2))
df <- as.data.frame(m1)
t1 <- table(df); t1
Upvotes: 1