Reputation: 569
I have a vector of strings consisting of n
letters, for example "ABCDEF"
I need to map this to some unique number. Of course, the intuitive approach is to extract all single letters letter
and then match them one by one to the corresponding number via
match(letter,LETTERS)
But that leads to too large numbers for large n
, because I need 2 digits for every single one of the letters (from 01
to 26
).
My idea is now to match each combination of strings to a unique number between 1
and 26^n
, making use of the fact that 26^n
has less than 2n
digits for large n
.
For example for n=4
we get "AAAA" -> 1
and "ZZZZ" -> 26^4
How can I do this in R?
Upvotes: 3
Views: 695
Reputation: 5202
While this may be clever, using a factor may be much simpler and far easier to understand. You also get to keep the string format close to hand, while getting the space saving of it being encoded as an integer.
If you need integers in a database (which will do joins better on them) then you can cast the factor to an int with as.integer(factor_column)
and you'll have the integer variants too.
What you'll loose is the determinism of the mapping, which may be important for you in the DB world if this is anything more than a one-off data load.
Upvotes: 0
Reputation: 101663
I guess you want to code the letters like below
f <- function(letter) sum((match(unlist(strsplit(letter,"")),LETTERS)-1)*26**((nchar(letter)-1):0))+1
such that
> f("AAAA")
[1] 1
> f("AABC")
[1] 29
> f("ZZZZ")
[1] 456976
Upvotes: 2