vahidg
vahidg

Reputation: 3963

Avoid creating 'new' String objects when converting a byte[] to String using a specific charset

I'm reading from a binary file and want to convert the bytes to US ASCII strings. Is there any way to do this without calling new on String to avoid multiple semantically equal String objects being created in the string literal pool? I'm thinking that it is probably not possible since introducing String objects using double quotes is not possible here. Is this correct?

private String nextString(DataInputStream dis, int size)
throws IOException
{
  byte[] bytesHolder = new byte[size];
  dis.read(bytesHolder);
  return new String(bytesHolder, Charset.forName("US-ASCII")).trim();

Upvotes: 5

Views: 2785

Answers (3)

Bombe
Bombe

Reputation: 83948

You shouldn’t be concerned about it—unless you profiled your application and have determined the String creation to be the exact source of your problem.

If you find out that the String creation is the source of your problem I would recommend what Jon Skeet proposed, i.e. a mapping from byte[] to String. That has about the same effect as interning your Strings while not hogging up valuable memory until you restart the VM.

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1502716

You'd have to have a cache mapping byte arrays to strings, then search through the cache for any equal values before creating a new string.

You can intern existing strings with intern() as Yishai posted - that won't stop you from creating more strings, but it'll make all but the first one (for any char sequence) very short lived. On the other hand, it'll make all the distinct strings live for a very long time indeed.

You can have "pseudo-interning" by using a Map<String, String>:

String tmp = new String(bytesHolder, Charset.forName("US-ASCII")).trim();
String cached = cache.get(tmp);
if (cached == null)
{
    cached = tmp;
    cache.put(tmp, tmp);
}
return cached;

You could even put a bit more effort in and end up with an LRU cache so that it'll keep the N most recently fetched strings, discarding others when it needs to.

None of that reduces the number of strings created in the first place, as I say - but is that likely to be a problem in your situation? GCs have been tuned to make it very cheap to create short-lived objects.

Upvotes: 3

Yishai
Yishai

Reputation: 91921

You can call the intern() method on the string to ensure one for the whole JVM.

String s = new String(bytes, "US-ASCII").intern();

You won't avoid creating the initial string again, but you will save on the storage.

That being said, interned strings have a limited storage space, so use with caution. A better option may be to implement a HashMap with the string as the key and value and check if the string already exists and get it if it does, insert it if it doesn't. That way you won't have such memory limitations.

Upvotes: 3

Related Questions