Reputation: 2134
I had this question in one of my interviews:
Given a string how would you compress it?
Example input is not in the form of aabbccdd
but like abcdgehrk
. i.e. all chars
in the string
are different.(Note:Run Length Encoding wont work ,as it was one of the solutions which I gave but he said string does not have any repetitive characters)
I gave the following two solutions but he did not accept them:
1) HashCode cannot be a solution as it will store numbers
2) Can't store in binary as its not human readable format
Can anyone please suggest what could be another solution for this problem?
Upvotes: 3
Views: 2182
Reputation: 459
I had solved like this question before, I will comprise string e.g (aaabb) after this process it will become (a3b1) then I will check if the length of the comprise string is less than the original string i will return comprise string otherwise return original (ab) - > (a1b1) in this situation I will return an original string. (aaaaabb) ->(a5b2) in this situation I will return comprise string. here is my code for this process it will take O(N)
public static String stringCompression(String str){
StringBuilder compressed = new StringBuilder();
int count = 1;
int i = 0;
for ( i = 0; i <str.length()-1 ; i++) {
if(str.charAt(i) == str.charAt(i+1)){
// System.out.println("str.charAt(i) = " + str.charAt(i));
count++;
}
else {
compressed.append(str.charAt(i)).append(count);
count =1;
}
}
if(i == str.length()-1)
compressed.append(str.charAt(i)).append(count);
return compressed.length() < str.length() ? new String(compressed): str;
}
and u may use this algo to decompse data
public static String stringDeCompression(String str){
StringBuilder stringBuilder = new StringBuilder();
int temp = 0;
int k = 0;
for (int i = 1; i <str.length() ; i+=2) {
temp = Character.getNumericValue(str.charAt(i));
for (int j = 0 ; j < temp ; j++) {
stringBuilder.append(str.charAt(k));
}
k+=2;
}
return new String(stringBuilder);
}
Upvotes: 0
Reputation: 3112
If the requirements allow that a string is composed of only lower case alphabetical characters, then each character can fit in 5 bits (2^5 = 32 possible characters). An 8-character string can then be fit into 40 bits = 5 bytes.
Here is an example, fitting 3 characters into 2 bytes:
a = 00001
b = 00010
c = 00011
The string "cab" can fit in:
c a b (extra bit)
00011 00001 00010 0
00011000 01000100
In big-endian form:
0x1844
The requirement that it be human readable is silly. For anything of this nature, software and standards (e.g. ASCII) are required in order for it to be read by a human. With the right software and output device, anything can be human readable.
Upvotes: 1
Reputation: 2292
Given that the examiner required that the compressed string is human-readable, one solution is Run-Length Encoding.
aabbccdd would therefore be compressed as 2a2b2c2d, and abcdgehrk as 1a1b1c1d1g1e1h1r1k.
Note that the output string in these special examples are not shorter than the original string, but it's a property of all lossless compression algorithms that they cannot guarantee compression for any input dataset.
Upvotes: 4