What is Garbage collection?
The process of destroying unused/unreferenced objects are called garbage collection. Java garbage collection is the process by which Java programs performs automatic memory management.
When Java programs run on the JVM, objects are created on the heap. When the object is no longer required, garbage collector find those unused objects and remove them to free up memory for reuse.
How Garbage Collection work?
Java Garbage collection is an automatic process, we don’t need to explicitly mark objects for deletion. Garbage collection implementation lives in the JVM. We can only request JVM for garbage collection, by using System.gc() and Runtime.getRunTime.gc(), but we can’t force it.
Different JVM’s have their own way of garbage collection implementation, Oracle’s HotSpot is the most common JVM which offers the robust and mature set of garbage collection options. HotSpot has multiple garbage collectors that are optimized for various use cases, but all follow the same basic principle.
In the first step, all the live objects are tracked and marked as alive. Everything else is eligible for garbage collection.
In the second step, eligible objects for garbage collection are deleted from the heap. Optionally memory can be compacted, so that remaining objects are in a contiguous block at the start of the heap. The memory compacting process makes it easier to allocate memory to new objects sequentially after the block of memory allocated to existing objects.
All objects are allocated on heap memory are managed by JVM. As long as the object being referenced, the JVM considers it alive.
All HotSpot garbage collectors implement Generational Garbage Collection strategy, which categorizes objects by their age. As per the Generational Garbage collection strategy, it believes that most objects are short-lived and are eligible for garbage collection as soon as they are created.
The heap is divided into three sections:
- Young Generation:
Newly created objects start in Young generation. The young generation is further subdivided into Eden and two survivors (S1, S2) spaces. Eden is the space where are the newly created objects start and moved to Survivor space after one GC cycle. When objects are garbage collected from Young generation space are called minor GC event. - Old Generation:
Objects that are long-lived are moved from Young Generation to Old Generation. When objects from Old generation are garbage collected are major GC event. - Permanent Generation (PermGen):
Metadata (classes and methods) are stored in Permanent Generation. Classes that are no longer required may be garbage collected from Permanent Generation.
In Java 8, PermGen is replaced by Metaspace. Metaspace now uses native memory for the representation of class metadata. It is not mandatory now to define PermSize and MaxPermSize. So, when memory usage increases you will not get the OutOfMemoryError, reserve native memory is increased to full-fill the memory usage. To know more about Metaspace, read this article.
During a full garbage collection event, unused objects in all generations are garbage collected.
Garbage Collection Roots (GC Roots):
GC Roots are basically stack variable/pointers that hold the reference to objects in Heap and accessible directly outside from the heap. Every object must have one or more root objects, as long as the application can reach those roots, the whole object tree is reachable.
There is four kind of GC roots in Java:
- Local variables: They are not a real object virtual reference and kept alive on the stack of the thread. For all intents and purposes, local variables are GC Roots.
- Active Java Thread: They are always live objects and are considered GC roots, this is also important for thread local variables.
- Static Variables: They are referenced by their classes, which makes GC roots. Classes themselves can be garbage collected which also removes all referenced static variables.
- JNI References: These are the java objects which gets created by native code as part of the JNI call. Objects created are treated specially because the JVM does not know if it is being referenced by native code or not.
A simple java application has the following GC roots:
- The local variable in the main method.
- The main thread
- Static variables of the main class.
Mark and Sweep Algorithm:
To determine which objects are no longer in use, the JVM intermittently runs mark and sweep algorithm. It is a 2 step process:
- The algorithm traverses all the object references, starting with the GC roots and marks every object found alive.
- Heap memory which is not occupied by marked objects is reclaimed and marked as free.
HotSpot JVM has four garbage collectors:
- Serial: Garbage collection events are performed serially in one thread. Memory compaction is performed after each garbage collection event.
- Parallel: Multiple threads are used for minor garbage collection. A single thread is used for major garbage collections and Old Generation memory compaction. Alternatively Parallel Old variant uses multiple threads for major garbage collection and Old Generation Compaction.
- Concurrent Mark and Sweep (CMS): Multiple threads are used for minor garbage collection using the same algorithm as parallel. Major garbage collection is also multi-threaded like Parallel Old, but CMS runs concurrently with the application. This means that garbage collection is not explicitly stopping the application threads to perform GC. No compaction is performed.
- G1 (Garbage first): This newest garbage collector is intended as a replacement for CMS. It is parallel and concurrent like CMS. The Garbage-First (G1) collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability while achieving high throughput.