Reputation: 61
Hi all I am trying to create a distance matrix from a random created sequence. #set the code
DNA <- c("A","G","T","C")
randomDNA <- c()
#create the vector of 64 elements
for (i in 1:64){
randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
warnings()
}
sizeofDNA <- length(randomDNA)
#this part that I want to iterate between vector's components
split_vector <- c()
DNAdiff <- c()
for (i in 1:length(randomDNA)){
split_vector <- strsplit(randomDNA[i], "")[[1]]
#print(split_vector)
for (j in 1:length(randomDNA)){
split_vector2 <- strsplit(randomDNA[j], "")[[1]]
#print(split_vector2)
DNAdiff[i,j] <- setdiff(split_vector,split_vector2)
#or
#DNAdiff[i] <- lenght(setdiff(strsplit(randomDNA[22], "")[[1]],strsplit(randomDNA[33], "")[[1]]))
}
}
What it does not work is A: the setdiff does not work as I expect B: no array is created
Question how do I export the results of the setdiff (if it will work) to an array so that I will have the distance matrix like array? Any recommendation is highly welcomed. Thank you all
EDIT: So there are 2 solutions:
A. Using, as mentioned in the comments by @ThomasIsCoding , the "adist" function; this will calculate the Levenshtein distances:
DNA <- c("A","G","T","C")
randomDNA <- c()
for (i in 1:64){
randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
}
dm <-as.matrix(adist(randomDNA))
rownames(dm) <- randomDNA
colnames(dm) <- randomDNA
pdf("heatmap.pdf")
heatmap(dm, Rowv = NA, Colv = NA)
dev.off()
write.csv(dm,"distance_matrix.csv", row.names = T, col.names = T )
B. Another method to calculate the Hamming distance will be:
DNA <- c("A","G","T","C")
randomDNA <- c()
for (i in 1:96){
randomDNA[i] <- paste0(sample(DNA, 6, replace = T), sep = "", collapse = "")
}
Humm <- matrix(nrow=length(randomDNA), ncol=length(randomDNA))
for (i in 1:length(randomDNA)){
split_vector <- strsplit(randomDNA[i], "")[[1]]
for (j in 1:length(randomDNA)){
split_vector2 <- strsplit(randomDNA[j], "")[[1]]
#Hamming distance is calculated as:
Humm[i,j] <- sum(split_vector != split_vector2)
}
}
rownames(Humm) <- randomDNA
colnames(Humm) <- randomDNA
pdf("heatmap.pdf")
heatmap(Humm, Rowv = NA, Colv = NA)
dev.off()
write.csv(Humm,"distance_matrix.csv", row.names = T, col.names = T )
Upvotes: 1
Views: 210
Reputation: 101099
I think you you might need adist
to get the distance matrix, e.g.,
adist(randomDNA)
Upvotes: 1