user3298812
user3298812

Reputation: 33

how to understand java String source code

How does it works?I cant understand how it gets to the point that, every time a new string has been created once you change something in the original one. What are offset, value and count standing for?

private final char value[];  
   private final int offset;  
   private final int count;  

public String() {  
       this.offset = 0;  
       this.count = 0;  
       this.value = new char[0];  
   }  

public String(String original) {  
       int size = original.count;  
       char[] originalValue = original.value;  
       char[] v;  
       if (originalValue.length > size) {  
           // The array representing the String is bigger than the new  
           // String itself.  Perhaps this constructor is being called  
           // in order to trim the baggage, so make a copy of the array.  
           int off = original.offset;  
           v = Arrays.copyOfRange(originalValue, off, off+size);  
       } else {  
           // The array representing the String is the same  
           // size as the String, so no point in making a copy.  
           v = originalValue;  
       }  
       this.offset = 0;  
       this.count = size;  
       this.value = v;  
   }  

public String(char value[]) {  
       int size = value.length;  
       this.offset = 0;  
       this.count = size;  
       this.value = Arrays.copyOf(value, size);  
   }  

public String(char value[], int offset, int count) {  
       if (offset < 0) {  
           throw new StringIndexOutOfBoundsException(offset);  
       }  
       if (count < 0) {  
           throw new StringIndexOutOfBoundsException(count);  
       }  
       // Note: offset or count might be near -1>>>1.  
       if (offset > value.length - count) {  
           throw new StringIndexOutOfBoundsException(offset + count);  
       }  
       this.offset = 0;  
       this.count = count;  
       this.value = Arrays.copyOfRange(value, offset, offset+count);  
   }  

public String substring(int beginIndex, int endIndex) {  
       if (beginIndex < 0) {  
           throw new StringIndexOutOfBoundsException(beginIndex);  
       }  
       if (endIndex > count) {  
           throw new StringIndexOutOfBoundsException(endIndex);  
       }  
       if (beginIndex > endIndex) {  
           throw new StringIndexOutOfBoundsException(endIndex - beginIndex);  
       }  
       return ((beginIndex == 0) && (endIndex == count)) ? this :  
           new String(offset + beginIndex, endIndex - beginIndex, value);         
   }  

public String concat(String str) {  
       int otherLen = str.length();  
       if (otherLen == 0) {  
           return this;  
       }  
       char buf[] = new char[count + otherLen];  
       getChars(0, count, buf, 0);  
       str.getChars(0, otherLen, buf, count);  
       return new String(0, count + otherLen, buf);       
   }  

public static String valueOf(char data[]) {  
       return new String(data);       
   }  

public static String valueOf(char c) {  
       char data[] = {c};  
       return new String(0, 1, data);       
   }  

Upvotes: 1

Views: 222

Answers (2)

RealSkeptic
RealSkeptic

Reputation: 34628

...every time a new string has been created once you change something in the original one.

I assume you are talking about the immutability of String. It's not something complex or clever. Rather, every operation on String does not change the original one. It simply copies the result over to a new String, or keeps around an unchanged reference to the old one.

A string is based on a character array, and the various string operations access that character array. When a new string that is different than the old string is to be created, a new character array is created, and data is copied over from the old string, with the changes in place. Then a new String object is made from that character array.

For example, the concat method prepares a new character array, copies over the data from the two Strings (the current one and the one passed as parameter), and then makes a new String object backed by this new character string. The two old String objects are not changed.

But the version you have brought here is from Java 6. Before Java 7, the authors of Java wanted to allocate less memory, by allowing substring operations to point to the original character array. The idea here was that since the original, long character array is never going to be changed (because none of the methods ever changes that array), all its substrings can actually be backed by the same character array, if you define a string by three items:

  1. Which character array is backing it.
  2. Where in that character array does our current string start (offset).
  3. How many characters in that character array are considered to be part of our current string (count).

So, a string such as "ABC" can be represented as:

  1. char array: { 'A', 'B', 'C' }, offset: 0, count: 3,
  2. char array: { 'A', 'B', 'C', 'D', 'E' }, offset: 0, count: 3,
  3. char array: { 'F', 'O', 'O', 'A', 'B', 'C' }, offset: 3, count: 3

All of these are valid implementations that Java (up to version 6) will consider to be "ABC".

This trick allowed them to avoid copying arrays when doing the substring operation. It doesn't cause the string to be immutable. It's a trick that's based on the fact that String is immputable.

However, in Java 7, this trick has been abandoned and now the only valid representation of "ABC" in Java is #1 above. The reason for this is that this trick actually caused memory leaks. You could create a huge string, take a few tiny substrings of it, then get rid of all references to the huge string... but still, it would not be garbage-collected. Why? Because the tiny substrings were still referring to the internal, huge, character array inside it.

To sum up:

  • String immutability is achieved by making sure that there is no method that changes the backing character array. All operations are read operations. You can see that the methods that return String values invoke new String(...) to return. And the various constructors do not change anything in the original strings passed to them as parameters.
  • The offset and count trick, now obsolete, relied on immutability to save copy operations. It did not cause immutability, just relied on it.

Upvotes: 2

Random832
Random832

Reputation: 39000

The value is the underlying character array of the string. The offset is where the string starts, and the count is how long it is. A string may be on the array {'a','b','c','d','e'} with count 3 and offset 1, and it's "bcd". This way the array isn't copied around for every substring operation.

Upvotes: 2

Related Questions