Reputation: 15
I am very new with R and i used to refer a lot here in stackoverflow. I would like to compare each rows and columns and create a matrix.
data
Line_name Marker_name Genodata Generation
Line_A Marker_1 AA F7
Line_A Marker_2 TT F7
Line_A Marker_3 CC F7
Line_B Marker_1 TT F7
Line_B Marker_3 AT F6
Line_B Marker_3 AA F7
Line_C Marker_2 AA F7
Line_C Marker_2 -- F8
Line_D Marker_1 -- F7
Line_D Marker_1 AA F8
Line_D Marker_4 AA F8
I would to get
[,Marker_1] [,Marker_2] [,Marker_3] [,Marker_4]
[Line_A] AA TT CC --
[Line_B] TT -- AA --
[Line_C] -- AA -- --
[Line_D] AA -- -- AA
Please help me !
I have edited my question with additional rules which should be implemented
Rule 1: If a line has same marker repeated two or more and the geno data are different, need to check the generation column and pick the latest generation. In this example Line B, marker_3 has two different genodata AT and AA. Based on the generation AA should override AT
Rule 2: If If a line has same marker repeated two or more and the geno data is missing despite of generation, the available data should override --. In this example Line C, marker_2 has genodata AA and --. AA should override --. Similarity Line D, marker_1 has genodata -- and AA. AA should override --.
Thanks
Upvotes: 0
Views: 508
Reputation: 1021
Here's a fun base R one-liner (using dat
as defined by Jilber's answer):
with(dat, tapply(as.character(V3), list(V1, V2), identity))
But Matthew's answer is essentially the right approach.
Upvotes: 0
Reputation: 61214
You need colnames in your data.frame, then use dcast
from reshape2 package
dat <- read.table(text="Line_A Marker_1 AA
Line_A Marker_2 TT
Line_A Marker_3 CC
Line_B Marker_1 TT
Line_B Marker_3 AA
Line_C Marker_2 AA
Line_D Marker_1 AA
Line_D Marker_4 AA", header=FALSE)
library(reshape2)
res <- dcast(V1~V2, data=dat)
res
# V1 Marker_1 Marker_2 Marker_3 Marker_4
#1 Line_A AA TT CC <NA>
#2 Line_B TT <NA> AA <NA>
#3 Line_C <NA> AA <NA> <NA>
#4 Line_D AA <NA> <NA> AA
To get the output as you want use print
and set na.print="--"
print(res, na.print="--")
# V1 Marker_1 Marker_2 Marker_3 Marker_4
#1 Line_A AA TT CC --
#2 Line_B TT -- AA --
#3 Line_C -- AA -- --
#4 Line_D AA -- -- AA
Upvotes: 2
Reputation: 42689
You can create a matrix with the rownames and colnames as in the first two columns of the data, then use matrix indexing of the matrix to fill in the data:
m <- matrix(,nrow=length(unique(x[,1])), ncol=length(unique(x[,2])))
rownames(m) <- unique(x[,1])
colnames(m) <- unique(x[,2])
m[as.matrix(x[,1:2])] <- as.character(x[,3])
m
Marker_1 Marker_2 Marker_3 Marker_4
Line_A "AA" "TT" "CC" NA
Line_B "TT" NA "AA" NA
Line_C NA "AA" NA NA
Line_D "AA" NA NA "AA"
Upvotes: 2