Reputation: 1334
Let's say I have a list of character: ['A','C','G','U']
and I want to make strings of a certain length, let's say 5.
From this, I can represent each string of this length as its index in dictionary order. For example, AAAAA is 1, AAAAC is 2, ..., AAACA is 5, etc...
My question is, given an arbitrary string of this length, let's say GUGAC, how do I get its index using R? (In this case, for GUGAC, it should be 738)
Upvotes: 3
Views: 82
Reputation: 24079
What you have here is a base 4 numbering system. The method is to covert the letters into the corresponding base 4 number, multiply by the 4^n power series and take the sum of the values.
string<-"GUGAC"
#Convert string to a vector of letters
strletters<-unlist(strsplit(string, ""))
#covert from letters to base counting (sequence is 0, 1, 2, 3, 10, 11 etc...)
facts<-factor(strletters, levels=c("A", "C", "G", "U"))
nums<-as.integer(facts)-1
#create list of multipliers
multipliers<-4**((length(nums)-1):0)
#sum of the multipliers* nums + 1 (typically start counting from 1 not 0)
sum(multipliers*nums)+1
Upvotes: 3