Reputation: 21
I have a requirement to shorten a 6 character string like "ABC123" into a unique 4 character string. It has to be repeatable so that the input string will always generate the same output string. Does anyone have any ideals how to do this?
Upvotes: 2
Views: 7492
Reputation: 533530
You have to make assumptions about the range of values the characters can have and when is an acceptable encoded character. There are any number of ways you can do this. You could pack the String to 1,2,3,4 or 5 characters depending on your assumptions.
One simple example which would give you 4 characters is to assume the last three letters are a number.
public static String pack(String text) {
return text.substring(0, 3) + (char) Integer.parseInt(text.substring(3));
}
public static String unpack(String text) {
return text.substring(0, 3) + ("" + (1000 + text.charAt(3))).substring(1);
}
public static void main(String[] args) throws IOException {
String text = "ABC123";
String packed = pack(text);
System.out.println("packed length= " + packed.length());
String unpacked = unpack(packed);
System.out.println("unpacked= '" + unpacked + '\'');
}
prints
packed length= 4
unpacked= 'ABC123'
Upvotes: 0
Reputation: 4239
Assumption: The input string can only have characters with ASCII decimal values below 128... otherwise, as others have stated, this wont work.
public class Foo {
public static int crunch(String str) {
int y = 0;
int limit = str.length() > 6 ? 6 : str.length();
for (int i = 0; i < limit; ++i) {
y += str.charAt(i) * (limit - i);
}
return y;
}
public static void main(String[] args) {
String[] words = new String[]{
"abcdef", "acdefb", "fedcba", "}}}}}}", "ZZZZZZ", "123", "!"
};
for (int idx = 0; idx < words.length; ++idx) {
System.out.printf("in=%-6s out=%04d\n",
words[idx], crunch(words[idx]));
}
}
}
Generates:
in=abcdef out=2072
in=acdefb out=2082
in=fedcba out=2107
in=}}}}}} out=2625
in=ZZZZZZ out=1890
in= 123 out=0298
in= ! out=0033
Upvotes: 0
Reputation: 17010
Not sure this can be done, as I would bet there are some business constraints (like a user has to be able to type in the key).
The idea is to "hash" down the value into a smaller number of places. This requires a character set large enough to handle all combinations.
Let's assume the original key is case insensitive, you have 26 + 10 = 32, then raised to the 6th unique combinations (2,176,782,336 unique combinations). To match this in only 4 characters, you have to use a character set with 216 unique characters, as 216 ^ 6 is 2,176,782,336 or the first number raise to 4 with more combinations than a case insensitive key with numbers. (case insentivity, plus numerics only takes you to 62).
If we take the standard US keyboard, we have 26 letters x 2 cases = 52 10 numbers 10 special characters on number keys 11 other special character keys * 2 = 22
This is 94 unique characters, or less than half the uniques you need just to get a case insensitive 6 digit code into 4 digits. Now, on the Planet Klingon, where keyboards are much larger ... ;-)
If the key is case insensitive, your character set has to expand to 489 unique characters to fit in a 4 digit "hash". Ouch!
Upvotes: 1
Reputation: 36987
You need some restrictions on the input string, otherwise math will inevitably bite you.
For example, let's assume you know that it consists of upper case letters and digits only. Therefore, there are 36^6 possible input strings.
The result needs to have less restrictions, e.g. you allow 216 different characters (printable extended ascii or something like that).
By pure coincidence, 216^4 = 36^6, so what you need is a mapping. That's easy, just use the algorithm for converting number representations from one radix to another.
Upvotes: 4
Reputation: 11640
It is not possible to do a fully unique mapping from a 6 character string to a 4 character string. This is an example of a simple hash function. Because the range space is smaller than the domain space, you are necessarily going to have some hash collisions. You can try to minimize the number of collisions based on the type of data you're going to be accepting, but ultimately it's impossible to map every 6 character string to a unique 4 character string, you would run out of 4 character strings.
Upvotes: 9