Claudio Moneo
Claudio Moneo

Reputation: 569

Mapping Letters to Numbers in R

I have a vector of strings consisting of n letters, for example "ABCDEF"

I need to map this to some unique number. Of course, the intuitive approach is to extract all single letters letter and then match them one by one to the corresponding number via

match(letter,LETTERS)

But that leads to too large numbers for large n, because I need 2 digits for every single one of the letters (from 01to 26).

My idea is now to match each combination of strings to a unique number between 1and 26^n, making use of the fact that 26^n has less than 2n digits for large n.

For example for n=4 we get "AAAA" -> 1 and "ZZZZ" -> 26^4

How can I do this in R?

Upvotes: 3

Views: 695

Answers (2)

dsz
dsz

Reputation: 5202

While this may be clever, using a factor may be much simpler and far easier to understand. You also get to keep the string format close to hand, while getting the space saving of it being encoded as an integer.

If you need integers in a database (which will do joins better on them) then you can cast the factor to an int with as.integer(factor_column) and you'll have the integer variants too.

What you'll loose is the determinism of the mapping, which may be important for you in the DB world if this is anything more than a one-off data load.

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101663

I guess you want to code the letters like below

f <- function(letter) sum((match(unlist(strsplit(letter,"")),LETTERS)-1)*26**((nchar(letter)-1):0))+1

such that

> f("AAAA")
[1] 1

> f("AABC")
[1] 29

> f("ZZZZ")
[1] 456976

Upvotes: 2

Related Questions