Wednesday, March 28, 2012

Strings, Literally

String literals are a little "special" in Java in the way that they are treated.
What is a String Literal? A String literal is a sequence of characters between quotation marks, such as "string" or "literal".

Strings are Immutable

So what makes String literals so special? Well, first of all, it's very important to remember that String objects are immutable. That means that, once created, a String object cannot be changed (short of using something like reflection to get at private data). Immutable, you say? Unchangable? What about this code?

public class ImmutableStrings
{
    public static void main(String[] args)
    {
        String start = "Hello";
        String end = start.concat(" World!");
        System.out.println(end);
    }
}

// Output

Hello World!
 
Well look at that, the String changed...or did it? In that code, no 
String object was ever changed. We start by assigning "Hello" to our 
variable, start. That causes a new String object to be created on the 
heap and a reference to that object is stored in start. Next, we invoke 
the method concat(String) on that object. Well, here's the trick, if we look
at the API Spec for String, you'll see this in the description of the 
concat(String) method:
Concatenates the specified string to the end of this string. 
If the length of the argument string is 0, then this String object is returned.
Otherwise, a new String object is created, representing a character sequence that is the concatenation of the character sequence represented by this String object and the character sequence represented by the argument string. Examples:     "cares".concat("s") returns "caress"     "to".concat("get").concat("her") returns "together" Parameters:     str - the String that is concatenated to the end of this String. Returns:     a string that represents the concatenation of this object's characters followed by
the string argument's characters.
Notice the part I've highlighted in bold. When you concatenate one String to another, it doesn't actually change the String object, it simply creates a new one that contains the contents of both of the original Strings, one after the other. That's exactly what we did above. 
 
The String object referenced by the local variable start never changed. 
In fact, if you added the statement System.out.println(start); after you 
invoked the concat method, you would see that start
still referenced a String object that contained just "Hello". And just 
in case you were wondering, the '+' operator does the exact same thing 
as the concat() method.

Strings really are immutable.
 

Storage of Strings - The String Literal Pool

 What is the String Literal Pool? Most often, I hear people say that it is a collection of String objects. Although that's close, it's not exactly correct. Really, it's a collection of references to String objects. Strings, even though they are immutable, are still objects like any other in Java. Objects are created on the heap and Strings are no exception. So, Strings that are part of the "String Literal Pool" still live on the heap, but they have references to them from the String Literal Pool.

Yeah, so that doesn't really explain what the pool is, or what it's for, does it? Well, because String objects are immutable, it's safe for multiple references to "share" the same String object. Take a look at this example:
 
public class ImmutableStrings
{
    public static void main(String[] args)
    {
        String one = "someString";
        String two = "someString";
        
        System.out.println(one.equals(two));
        System.out.println(one == two);
    }
}

// Output

true
true 
 
In such a case, there is really no need to make two instances of an 
identical String object. If a String object could be changed, as a 
StringBuffer can be changed, we would be forced to create two separate 
objects. But, as we know that String objects cannot change, we can 
safely share a String object among the two String references, one and two.
This is done through the String literal pool. Here's how it is accomplished:


When a .java file is compiled into a .class file, any String literals 
are noted in a special way, just as all constants are. When a class is loaded
(note that loading happens prior to initialization), the JVM goes 
through the code for the class and looks for String literals. When it 
finds one, it checks to see if an equivalent String is already 
referenced from the heap. If not, it creates a String instance on the 
heap and stores a reference to that object in the constant table. Once a
reference is made to that String object, any references to that String 
literal throughout your program are simply replaced with the reference 
to the object referenced from the String Literal Pool.



So, in the example shown above, there would be only one entry in the 
String Literal Pool, which would refer to a String object that contained
the word "someString". Both of the local variables, one and two,
would be assigned a reference to that single String object. You can see
that this is true by looking at the output of the above program. While 
the equals() method checks to see if the String objects contain the same
data ("someString"), the == operator, when 
used on objects, checks for referential equality - that means that it 
will return true if and only if the two reference variables refer to the
exact same object. In such a case, the references are equal. From the 
above output, you can see that the local variables, one and two, not only 
refer to Strings that contain the same data, they refer to the same object.

Graphically, our objects and references would look something like this: 
Note, however, that this is a special behavior for String Literals
Constructing Strings using the "new" keyword implies a different sort of behavior.
 
Let's look at an example:
 
 public class ImmutableStrings
{
    public static void main(String[] args)
    {
        String one = "someString";
        String two = new String("someString");
        
        System.out.println(one.equals(two));
        System.out.println(one == two);
    }
}

// Output

true
false
 
In this case, we actually end up with a slightly different behavior 
because of the keyword "new." In such a case, references to the two 
String literals are still put into the constant table (the String 
Literal Pool), but, when you come to the keyword "new," the JVM is 
obliged to create a new String object at run-time, rather than using the
one from the constant table.

In such a case, although the two String references refer to String objects 
that contain the same data, "someString", they do not refer to the same object.
That can be seen from the output of the program. While the equals() method 
returns true, the == operator, which checks for referential equality, returns
false, indicating that the two variables refer to distinct String objects.

Once again, if you'd like to see this graphically, it would look 
something like this. Note that the String object referenced from the 
String Literal Pool is created when the class is loaded while the other 
String object is created at runtime, when the "new String..." line is 
executed.
 
If you'd like to get both of these local variables to refer to the same object, 
you can use the intern() method defined in String. Invoking two.intern()
will look for a String object referenced from the String Literal Pool 
that has the same value as the one you invoked the intern method upon. 
If one is found, a reference to that String is returned and can be 
assigned to your local variable. If you did so, you'd have a picture 
that looks just like the one above, with both local variables, one and two,
referring to the same String object, which is also referenced from the 
String Literal Pool. At that point, the second String object, which was 
created at run-time, would be eligible for garbage collection.
 

Garbage Collection

What makes an object eligible for garbage collection? If you're preparing for the SCJP exam (or even if you're not), the answer to that question should roll right off your tongue. An object is eligible for garbage collection when it is no longer referenced from an active part of the application. Anyone see what is special about garbage collection for String literals? Let's look at an example and see if you can see where this is going.

public class ImmutableStrings
{
    public static void main(String[] args)
    {
        String one = "someString";
        String two = new String("someString");
        
        one = two = null;
    }
} 
Just before the main method ends, how many objects are available for garbage 
collection? 0? 1? 2? 

The answer is 1. Unlike most objects, String literals always have a 
reference to them from the String Literal Pool. That means that they always
have a reference to them and are, therefore, not eligible for garbage 
collection. This is the same example as I used above so you can see what
our picture looked liked originally there. Once we assign our 
variables, one and two, to null, we end up with a picture that looks like this:
 
 
 
As you can see, even though neither of our local variables, one or two,
refer to our String object, there is still a reference to it from the 
String Literal Pool. Therefore, the object is not elgible for garbage 
collection. The object is always reachable through use of the intern() 
method, as referred to earlier.

Conclusion

Like I said at the outset of this article, virtually none of this information is included on the SCJP exam. However, I constantly see this question coming up in the SCJP forum and on various mock exams. These are a few of the highlights you can keep in mind when it comes to String literals:
  • Equivalent String Literals (even those stored in separate classes in separate packages) will refer to the same String object.
  • In general, String Literals are not eligible for garbage collection. Ever.
  • Strings created at run-time will always be distinct from those created from String Literals.
  • You can reuse String Literals with run-time Strings by utilizing the intern() method.
  • The best way to check for String equality is to use the equals() method.
 

No comments:

Post a Comment