stackoverflow
stackoverflow

Reputation: 19444

Java How to store/represent a String into long. Then from long to String

How would you store/represent a String into a long? Then jam it into an 8 byte array?

Things I've tried/working with

    String eidString = "Awesome!";
    ByteBuffer buf = ByteBuffer.allocate(8);
    CharBuffer cbuf = buf.asCharBuffer();
    cbuf.put(eidString);

    byte[] eid = ByteBuffer.allocate(8).putLong(cbuf ??);

Attempt 2

    Long l = Long.valueOf("Awesome!");

    byte[] eid = ByteBuffer.allocate(8).putLong(l).array();

    long p = ByteBuffer.wrap(eid).getLong();

    System.out.println(p);

Attemp 3

String input = "hello long world";

byte[] bytes = input.getBytes();
LongBuffer tmpBuf = ByteBuffer.wrap(bytes).asLongBuffer();

long[] lArr = new long[tmpBuf.remaining()];
for (int i = 0; i < lArr.length; i++)
    lArr[i] = tmpBuf.get();

System.out.println(input);
System.out.println(Arrays.toString(lArr));
// store longs...

// ...load longs
long[] longs = { 7522537965568945263L, 7955362964116237412L };
byte[] inputBytes = new byte[longs.length * 8];
ByteBuffer bbuf = ByteBuffer.wrap(inputBytes);
for (long l : longs)
    bbuf.putLong(l);
System.out.println(new String(inputBytes));

Upvotes: 2

Views: 4226

Answers (4)

recursion.ninja
recursion.ninja

Reputation: 5488

If you accept the limited character set of:

 a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,
 A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,
 0,1,2,3,4,5,6,7,8,9, , <-- space character

then you will have 63 symbols in your reduced alphabet R. Each symbol can be mapped to a 6 bit representation (64 combinations unique combination of bits). There is an implicit 64th symbol which is the empty symbol used to mark the termination of the string, which should be represented by 0x00. This leaves us with an alphabet R that maps to 6 bits as a bijection.

A long has 64 bits of information. Since 64/6 = 10, this means that we can store a string with a length of up to 10 characters from alphabet R in a Java long variable.

It is worth noting that even though R is a reduced alphabet and we have a string length limit of 10, we can still express meaningful English phrases, converted them to a long, and back again!

Java Code (64 bit mapping):

public static long stringToLong(String s) {
  if(s.length() >10) { throw new IllegalArgumentException("String is too long: "+s); }
  long out = 0L;
  for(int i=0; i<s.length(); ++i) {
    long m = reducedMapping(s.codePointAt(i));
    if (m==-1) { throw new IllegalArgumentException("Unmapped Character in String: "+s); }
    m <<= ((9-i)*6)+4;
    out |= m;
  }
  return out;
}

public static String longToString(long l) {
  String out = "";
  long m = 0xFC00000000000000L;
  for(int i=0; i<10; ++i,m>>>=6) {
    int x =(int)( (l&m) >>> (((9-i)*6)+4));
    if(x==0) { break; }
    out += mapping[x];
  }
  return out;
}


public static long reducedMapping(int x) {
  long out=-1;
       if(x >= 97 && x <= 122) { out = (long)(x-96); } //  'a' =>  1 : 0x01
  else if(x >= 65 && x <=  90) { out = (long)(x-37); } //  'A' => 27 : 0x1B
  else if(x >= 48 && x <=  57) { out = (long)(x-+5); } //  '0' => 53 : 0x35
  else if(x == 32 )            { out = 63L;          } //  ' ' => 63 : 0x3F
  return out;
}
public static char[] mapping = {
'\n', //<-- unused/empty character
'a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'0','1','2','3','4','5','6','7','8','9',' '
};

Java Code (32 bit mapping):

public static long stringToLong32(String s) {
      if(s.length() >12) { throw new IllegalArgumentException("String is too long: "+s); }
      long out = 0L;
      for(int i=0; i<s.length(); ++i) {
        long m = reducedMapping32(s.codePointAt(i));
        if (m==-1) { throw new IllegalArgumentException("Unmapped Character in String: "+s); }
        m <<= ((11-i)*5)+4;
        out |= m;
      }
      return out;
    }

public static String longToString32(long l) {
      String out = "";
      long m = 0xF800000000000000L;
      for(int i=0; i<12; ++i,m>>>=5) {
        int x =(int)( (l&m) >>> (((11-i)*5)+4));
        if(x==0) { break; }
        out += mapping32[x];
      }
      return out;
    }

public static long reducedMapping32(int x) {
      long out=-1;
           if(x >= 97 && x <= 122) { out = (long)(x-96); } //  'a' =>  1 : 0x01
      else if(x >= 65 && x <=  90) { out = (long)(x-64); } //  'A' =>  1 : 0x01
      else if(x >= 32 && x <= 34)  { out = (long)(x-5);  } //  ' ','!','"' => 27,28,29
      else if(x == 44 )            { out = 30L;          } //  ',' => 30 : 0x1E
      else if(x == 46 )            { out = 31L;          } //  '.' => 31 : 0x1F
      return out;
    }
public static char[] mapping32 = {
    '\n', //<-- unused/empty character
    'a','b','c','d','e','f','g','h','i','j','k','l','m',
    'n','o','p','q','r','s','t','u','v','w','x','y','z',
    ' ','!','"',',','.' 
    };
}


EDIT:

Use this class for a more generalized conversions. It allows Strings of any non empty set of characters to be uniquely mapped uniquely to a long and converted back to the original String again. Simply define any character set (char[]) through the constructor and use the resulting object to convert back and forth using str2long & long2str.

Java StringAndLongConverter Class:

import java.util.Arrays;
import java.util.regex.Pattern;

public class StringAndLongConveter {

  /* .,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,., */
  /*   String <--> long ID Conversion   */
  /* `'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`' */

  /* --<[ Class Members ]>-- */
  // Don't re-arrange, order-dependent initializations

  private final char[]  CHAR_MAP;
  public  final int     NUM_ACTUAL_CHARS;
  public  final int     NUM_MAPPED_CHARS;
  public  final int     BIT_COUNT;
  public  final int     MAX_NUM_CHARS;
  public  final long    ROLLING_MASK;
  public  final long    FORMAT_MASK;
  public  final long    MIN_VALUE;
  public  final long    MAX_VALUE;
  public  final Pattern REGEX_CHAR_VALIDATOR;

  public StringAndLongConveter(char[] chars) {
    if(chars == null  ) { throw new IllegalArgumentException("Cannot Pass in null reference"); }
    if(chars.length==0) { throw new IllegalArgumentException("Cannot Pass in empty set"     ); }
    CHAR_MAP             = setCharMap(chars);
    NUM_ACTUAL_CHARS     = CHAR_MAP.length;
    NUM_MAPPED_CHARS     = NUM_ACTUAL_CHARS+1;
    BIT_COUNT            = calcMinBitsNeeded();
    MAX_NUM_CHARS        = calcMaxPossibleChars();
    ROLLING_MASK         = calcRollingMask();
    FORMAT_MASK          = calcFormatMask();
    MIN_VALUE            = calcIDMinVal();
    MAX_VALUE            = calcIDMaxVal();
    REGEX_CHAR_VALIDATOR = createRegExValidator();
  }

  /* --<[ Dynamic Initialization Calculation Helper Methods ]>-- */

  //Remove duplicates
  private final char[] setCharMap(final char[] chars) {
    char[] tmp = new char[chars.length];    
    int dupes = 0;
    for(int i=0; i<chars.length; ++i) {
      boolean dupeFound = false;
      for(int j=0; !dupeFound && j<i; ++j) {
        if(chars[i]==chars[j]) {
          ++dupes;
          dupeFound = true;   
        }
      }
      if(!dupeFound) { tmp[i-dupes] = chars[i]; }
    }
    char[] out = new char[chars.length-dupes];
    if(dupes==0) { out = chars; }
    else {
      for(int i=0; i<out.length; ++i) out[i] = tmp[i];
    }
    return out;
  } 
  // calculate minimum bits necessary to encode characters uniquely
  private final int calcMinBitsNeeded() {
    if(NUM_MAPPED_CHARS==0) { return 0; }
    int val,tmp,log;
    val = NUM_MAPPED_CHARS;
    tmp = Integer.highestOneBit(val); // returns only the highest set bit
    tmp = tmp | (tmp-1);              // propagate left bits
    log = Integer.bitCount(tmp);      // count bits (logarithm base 2)
    return ((val&(val-1))==0) ? log-1 : log;
    //return one less then bit count if even power of two
  }
  //Calculate maximum number of characters that can be encoded in long
  private final int calcMaxPossibleChars() {
    return Long.SIZE/BIT_COUNT;
  }
  //Calculate rolling mask for str <--> long conversion loops
  private final long calcRollingMask() {
    long   mask = 0x0000000000000001L;
    for(int i=1; i<BIT_COUNT;     ++i) { mask  |= mask << 1; }
    for(int i=1; i<MAX_NUM_CHARS; ++i) { mask <<= BIT_COUNT; }
    return mask;
  }
  //Calculate format mask for long input format validation
  private final long calcFormatMask() {
    //propagate lest significant set bit in rolling mask & negate resulting value
    return ~(ROLLING_MASK | (ROLLING_MASK-1));
  }
  //Calculate min value of long encoding
  //doubles as format specification for unused bits
  private final long calcIDMinVal() {
    return 0xAAAAAAAAAAAAAAAAL & FORMAT_MASK;
  }
  //Calculate max value of long encoding
  private final long calcIDMaxVal(){
    char   maxChar    = CHAR_MAP[CHAR_MAP.length-1];
    char[] maxCharArr = new char[MAX_NUM_CHARS];
    Arrays.fill(maxCharArr, maxChar);
    return str2long(new String(maxCharArr));
  }

  //Dynamically create RegEx validation string for invalid characters
  private final Pattern createRegExValidator() {
    return Pattern.compile("^["+Pattern.quote(new String(CHAR_MAP))+"]+?$");
  }

  /* --<[ Internal Helper Methods ]>-- */

  private static boolean ulongLessThen(long lh, long rh) {
    return (((lh ^ rh) >> 63) == 0) ? lh < rh : (0x8000000000000000L & lh)==0;
  }

  private long charMapping(final char c) {
    for(int i=0; i<CHAR_MAP.length; ++i)
      if(CHAR_MAP[i]==c)
        return i+1;
    return -1;
  }

  /* --<[ String <--> long Conversion Methods ]>-- */

  public  final String long2str(final long n) {
    String out = "";
    if (ulongLessThen(n,MIN_VALUE) || ulongLessThen(MAX_VALUE,n)) { throw new IllegalArgumentException("Long Outside of Formatted Range: "+Long.toHexString(n)); }
    if ((FORMAT_MASK & n) != MIN_VALUE)                           { throw new IllegalArgumentException("Improperly Formatted long"); }
    long m = ROLLING_MASK;
    for(int i=0; i<MAX_NUM_CHARS; ++i,m>>>=BIT_COUNT) {
      int x =(int)( (n&m) >>> ((MAX_NUM_CHARS-i-1)*BIT_COUNT));//10|10 0111
      if(x >= NUM_MAPPED_CHARS) { throw new IllegalArgumentException("Invalid Formatted bit mapping: \nlong="+Long.toHexString(n)+"\n masked="+Long.toHexString(n&m)+"\n i="+i+" x="+x); }
      if(x==0) { break; }
      out += CHAR_MAP[x-1];
    }
    return out;
  }

  public  final long str2long(String str) {
    if(str.length() > MAX_NUM_CHARS) { throw new IllegalArgumentException("String is too long: "+str); }
    long out = MIN_VALUE;
    for(int i=0; i<str.length(); ++i) {
      long m = charMapping(str.charAt(i));
      if (m==-1) { throw new IllegalArgumentException("Unmapped Character in String: "+str); }
      m <<= ((MAX_NUM_CHARS-i-1)*BIT_COUNT);
      out += m; // += is more destructive then |= allowing errors to be more readily detected 
    }
    return out;
  }

  public  final boolean isValidString(String str) {
    return str != null && !str.equals("")               //null or empty String
       &&  str.length() <= MAX_NUM_CHARS                //too long
       &&  REGEX_CHAR_VALIDATOR.matcher(str).matches(); //only valid chars in string
  }

  public final char[] getMappedChars() { return Arrays.copyOf(CHAR_MAP,CHAR_MAP.length); }

}

Have fun encoding & decoding Strings & longs

Upvotes: 2

David Ruan
David Ruan

Reputation: 794

You don't need to do that. String has a method called getBytes(). It do the convert for you directly. Call the following method with parameter "Hallelujah"

public static void strToLong(String s) throws IOException
{
    byte[] bArr = s.getBytes();
    for( byte b : bArr)
    {
        System.out.print(" " + b);
    }
    System.out.println();

    System.out.write(bArr);
}

The result is

 72 97 108 108 101 108 117 106 97 104
Hallelujah

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533530

You need to encode your string as a number and reverse it.

  • you have to determine the number of symbols you will need. e.g. 64 symbols need 6 bits. 32 symbols need 5 bits.
  • this will determine maximum length of a string. e.g. for 6 bits => 64/6 = 10 symbols, for 8 bits => 64/8 = 8 symbols. e.g. "hello long world" will not fit unless you assume not all a-z is available.

Once you have done this you can encode the symbols in the same way you would parse a 10 or 36 base number. To turn back into a String you can do the reverse (like printing a base 10 or 36 number)

What is the range of possible characters/symbols? (you need to include a terminating symbol as the Strings can vary in length)

Upvotes: 4

ktm5124
ktm5124

Reputation: 12123

To parse a String into a Long, can use the Long wrapper class.

String myString = "1500000";
Long myLong = Long.parseLong(myString);

To stuff it into an 8-byte array...

long value = myLong.longValue();
byte[] bytes = new byte[8];
for (int i = 0; i < bytes.length; i++) {
   long mask = 0xFF00000000000000 >> (i * 8);
   bytes[i] = (byte) (value & mask);   
}

This example is big endian.

If you're encoding a String into a long, then you can do something like:

String myString = "HELLO";
long toLong = 0;
for (int i = 0; i < myString.length(); i++) {
   long c = (long) myString.charAt(i);
   int shift = (myString.length() - 1 - i) * 8;
   toLong += c << shift;
}

This hasn't been tested. There might be a few things wrong with it.

Upvotes: 1

Related Questions