Nick
Nick

Reputation: 3435

Garbage collection vs manual memory management

This is a very basic question. I will formulate it using C++ and Java, but it's really language-independent. Consider a well-known problem in C++:

struct Obj
{
    boost::shared_ptr<Obj> m_field;
};

{
    boost::shared_ptr<Obj> obj1(new Obj);
    boost::shared_ptr<Obj> obj2(new Obj);
    obj1->m_field = obj2;
    obj2->m_field = obj1;
}

This is a memory leak, and everybody knows it :). The solution is also well-known: one should use weak pointers to break the "refcount interlocking". It is also known that this problem cannot be resolved automatically in principle. It's solely programmer's responsibility to resolve it.

But there's a positive thing: a programmer has full control on refcount values. I can pause my program in debugger and examine refcount for obj1, obj2 and understand that there's a problem. I also can set a breakpoint in destructor of an object and observe a destruction moment (or find out that object has not been destroyed).

My question is about Java, C#, ActionScript and other "Garbage Collection" languages. I might be missing something, but in my opinion they

  1. Do not let me examine refcount of objects
  2. Do not let me know when object is destroyed (okay, when object is exposed to GC)

I often hear that these languages just do not allow a programmer to leak a memory and that's why they are great. As far as I understand, they just hide memory management problems and make it hard to solve them.

Finally, the questions themselves:

Java:

public class Obj
{
    public Obj m_field;
}

{
     Obj obj1 = new Obj();
     Obj obj2 = new Obj();
     obj1.m_field = obj2;
     obj2.m_field = obj1;
}
  1. Is it memory leak?
  2. If yes: how do I detect and fix it?
  3. If no: why?

Upvotes: 11

Views: 11979

Answers (6)

Peter Lawrey
Peter Lawrey

Reputation: 533472

Managed memory systems are built on the assumption that you don't want to be tracing memory leak issue in the first place. Instead of making them easier to solve you try to make sure they never happen in the first place.

Java does have a loose term for "Memory Leak" which means any growth in memory which could impact your application, but there is never a point that the managed memory cannot clean up all the memory.

The JVM doesn't use reference counting for a number of reasons.

  • It cannot handled circular references as you have observed.
  • It requires significant memory and threading overhead to maintain accurately.
  • There are much better, simpler ways of handling such situations for managed memory.

While the JLS doesn't ban the use of reference counts, it is not used in any JVM AFAIK.

Instead, Java keeps track of a number of root contexts (e.g. each thread stack) and can trace which objects need to be kept and which can be discarded based on whether those objects are strongly reachable. It also provides the facility for weak references (which are retained as long as the objects are not cleaned up) and soft references (which are not generally cleaned up but can be at the garbage collectors discretion).

Upvotes: 12

Techcable
Techcable

Reputation: 66

Java has a unique memory management strategy. Everything (except a few specific things) are allocated on the heap, and isn't freed until the GC gets to work.

For example:

public class Obj {
    public Object example;
    public Obj m_field;
}

public static void main(String[] args) {
    int lastPrime = 2;
    while (true) {
        Obj obj1 = new Obj();
        Obj obj2 = new Obj();
        obj1.example = new Object();
        obj1.m_field = obj2;
        obj2.m_field = obj1;
        int prime = lastPrime++;
        while (!isPrime(prime)) {
            prime++;
        }
        lastPrime = prime;
        System.out.println("Found a prime: " + prime);
    }
}

C handles this situation by requiring you to manually free the memory of both 'obj', and C++ counts references to 'obj' and automatically destroys them when they go out of scope. Java does not free this memory, at least not at first.

The Java runtime waits a while until it feels like there is too much memory being used. After that the Garbage collector kicks in.

Let's say the java garbage collector decides to clean up after the 10,000th iteration of the outer loop. By this time, 10,000 objects have been created (which would have already been freed in C/C++).

Although there are 10,000 iterations of the outer loop, only the newly created obj1 and obj2 could possibly be referenced by the code.

These are the GC 'roots', which java uses to find all objects which could possibly be referenced. The garbage collector then recursively iterates down the object tree, marking 'example' as active in addiction to the garbage collector roots.

All those other objects are then destroyed by the garbage collector. This does come with a performance penalty, but this process has been heavily optimized, and isn't significant for most applications.

Unlike in C++, you don't have to worry about reference cycles at all, since only objects reachable from the GC roots will live.

With java applications you do have to worry about memory (Think lists holding onto the objects from all iterations), but it isn't as significant as other languages.

As for debugging: Java's idea of debugging high memory values are using a special 'memory-analyzer' to find out what objects are still on the heap, not worrying about what is referencing what.

Upvotes: 2

Fulvio Esposito
Fulvio Esposito

Reputation: 153

Garbage collected languages don't let you inspect refcounter because they have no-one. Garbage collection is an entirely different thing from refcounted memory management. The real difference is in determinism.

{
std::fstream file( "example.txt" );
// do something with file
}
// ... later on
{
std::fstream file( "example.txt" );
// do something else with file
}

in C++ you have the guarantee that example.txt has been closed after the first block is closed, or if an exception is thrown. Caomparing it with Java

{
try 
  {
  FileInputStream file = new FileInputStream( "example.txt" );
  // do something with file
  }
finally
  {
  if( file != null )
    file.close();
  }
}
// ..later on
{
try 
  {
  FileInputStream file = new FileInputStream( "example.txt" );
  // do something with file
  }
finally
  {
  if( file != null )
    file.close();
  }
}

As you see, you have traded memory management for all other resources management. That is the real diffence, refcounted objects still keep deterministic destruction. In garbage collection languages you must manually release resources, and check for exception. One may argue that explicit memory management can be tedious and error prone, but in modern C++ you it is mitigated by smart pointers and standard containers. You still have some responsibilities (circular references, for example), but think at how many catch/finally block you can avoid using deterministic destruction and how much typing a Java/C#/etc. programmer must do instead (as they have to manually close/release resources other than memory). And I know that there's using syntax in C# (and something similar in the newest Java) but it covers only the block scope lifetime and not the more general problem of shared ownership.

Upvotes: 0

Benj
Benj

Reputation: 32398

Garbage collection is not simple ref counting.

The circular reference example which you demonstrate will not occur in a garbage collected managed language because the garbage collector will want to trace allocation references all the way back to something on the stack. If there isn't a stack reference somewhere it's garbage. Ref counting systems like shared_ptr are not that smart and it's possible (like you demonstrate) to have two objects somewhere in the heap which keep each other from being deleted.

Upvotes: 1

OldCurmudgeon
OldCurmudgeon

Reputation: 65793

The critical difference is that in Java etc you are not involved in the disposal problem at all. This may feel like a pretty scary position to be but it is surprisingly empowering. All the decisions you used to have to make as to who is responsible for disposing a created object are gone.

It does actually make sense. The system knows much more about what is reachable and what is not than you. It can also make much more flexible and intelligent decisions about when to tear down structures etc.

Essentially - in this environment you can juggle objects in a much more complex way without worrying about dropping one. The only thing you now need to worry about is if you accidentally glue one to the ceiling.

As an ex C programmer having moved to Java I feel your pain.

Re - your final question - it is not a memory leak. When GC kicks in everything is discarded except what is reachable. In this case, assuming you have released obj1 and obj2 neither is reachable so they will both be discarded.

Upvotes: 1

AFAIK, Java GC works by starting from a set of well-defined initial references and computing a transitive closure of objects which can be reached from these references. Anything not reachable is "leaked" and can be GC-ed.

Upvotes: 6

Related Questions