user3543621
user3543621

Reputation: 15

How to create a matrix in R using the rows and column data?

I am very new with R and i used to refer a lot here in stackoverflow. I would like to compare each rows and columns and create a matrix.

data
Line_name   Marker_name    Genodata   Generation
Line_A      Marker_1       AA         F7
Line_A      Marker_2       TT         F7
Line_A      Marker_3       CC         F7
Line_B      Marker_1       TT         F7
Line_B      Marker_3       AT         F6
Line_B      Marker_3       AA         F7
Line_C      Marker_2       AA         F7
Line_C      Marker_2       --         F8
Line_D      Marker_1       --         F7
Line_D      Marker_1       AA         F8
Line_D      Marker_4       AA         F8

I would to get

         [,Marker_1] [,Marker_2] [,Marker_3] [,Marker_4] 
[Line_A]     AA             TT         CC         --
[Line_B]     TT             --         AA         --
[Line_C]     --             AA         --         --
[Line_D]     AA             --         --         AA 

Please help me !

I have edited my question with additional rules which should be implemented

Rule 1: If a line has same marker repeated two or more and the geno data are different, need to check the generation column and pick the latest generation. In this example Line B, marker_3 has two different genodata AT and AA. Based on the generation AA should override AT

Rule 2: If If a line has same marker repeated two or more and the geno data is missing despite of generation, the available data should override --. In this example Line C, marker_2 has genodata AA and --. AA should override --. Similarity Line D, marker_1 has genodata -- and AA. AA should override --.

Thanks

Upvotes: 0

Views: 508

Answers (3)

Michael Lawrence
Michael Lawrence

Reputation: 1021

Here's a fun base R one-liner (using dat as defined by Jilber's answer):

with(dat, tapply(as.character(V3), list(V1, V2), identity))

But Matthew's answer is essentially the right approach.

Upvotes: 0

Jilber Urbina
Jilber Urbina

Reputation: 61214

You need colnames in your data.frame, then use dcast from reshape2 package

dat <- read.table(text="Line_A Marker_1 AA
Line_A Marker_2 TT
Line_A Marker_3 CC
Line_B Marker_1 TT
Line_B Marker_3 AA
Line_C Marker_2 AA
Line_D Marker_1 AA
Line_D Marker_4 AA", header=FALSE)


library(reshape2)
res <- dcast(V1~V2, data=dat)
res
#     V1 Marker_1 Marker_2 Marker_3 Marker_4
#1 Line_A       AA       TT       CC     <NA>
#2 Line_B       TT     <NA>       AA     <NA>
#3 Line_C     <NA>       AA     <NA>     <NA>
#4 Line_D       AA     <NA>     <NA>       AA

To get the output as you want use print and set na.print="--"

print(res, na.print="--")
#      V1 Marker_1 Marker_2 Marker_3 Marker_4
#1 Line_A       AA       TT       CC       --
#2 Line_B       TT       --       AA       --
#3 Line_C       --       AA       --       --
#4 Line_D       AA       --       --       AA

Upvotes: 2

Matthew Lundberg
Matthew Lundberg

Reputation: 42689

You can create a matrix with the rownames and colnames as in the first two columns of the data, then use matrix indexing of the matrix to fill in the data:

m <- matrix(,nrow=length(unique(x[,1])), ncol=length(unique(x[,2])))
rownames(m) <- unique(x[,1])
colnames(m) <- unique(x[,2])
m[as.matrix(x[,1:2])] <- as.character(x[,3])
m
       Marker_1 Marker_2 Marker_3 Marker_4
Line_A "AA"     "TT"     "CC"     NA      
Line_B "TT"     NA       "AA"     NA      
Line_C NA       "AA"     NA       NA      
Line_D "AA"     NA       NA       "AA"    

Upvotes: 2

Related Questions