Friday, October 9, 2015

How to reset ArrayList in Java - Clear vs RemoveAll

Many times we want to reset an ArrayList for the reusing purpose, by resetting we mean clearing it or removing all elements. There are two ways to reset an ArrayList in Java, by using clear() method or calling removeAll(). If your ArrayList is small enough e.g. contains only 10 or 100 elements then you can use any of these two methods without worrying too much, but, if you have a huge list with lots of objects e.g. an ArrayList containing 10M entries, then choice of clear() vs removeAll() can make a huge difference in performance of your Java application. Sometimes it's even better to create a new ArrayList instead of resetting the old one, especially if resetting takes long time, but this also has a caveat, you need to make sure that old ArrayList is eligible for garbage collection, otherwise there is huge risk of java.lang.OutOfMemoryError: Java Heap Space. Coming back to clear() vs removeAll() method, you should always use clear(), because it gives you O(n) performance, while removeAll(Collection c) is worse, it gives O(n^2) performance, that's why you see huge difference in time taken by clearing a large ArrayList by these two methods. Things will be obvious, when you will run our example program and see the code of clear() and removeAll() method from JDK API. By the way, if you are in doubt, use clear() method and if not then always prefer clear over removeAll in Java.


Clear() vs RemoveAll(Collection c)
In order to compare the performance of both these methods, it's very important to see their code. You can check the source code of the clear() method in java.util.ArrayList class, for convenience I have included it here. This code is from JDK version 1.7.0_40.
   /**
     * Removes all of the elements from this list.  The list will
     * be empty after this call returns.
     */
    public void clear() {
        modCount++;
        // clear to let GC do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;
        size = 0;
    }


You can see that this method loop over ArrayList and assign null to every element to make them eligible for garbage collection, of course if there is no external reference. Similarly, you can check the source code of java.util.AbstractCollection class to look at how removeAll(Collection c) method works, here is snippet:

public boolean removeAll(Collection c) {
        boolean modified = false;
        Iterator it = iterator();
        while (it.hasNext()) {
            if (c.contains(it.next())) {
                it.remove();
                modified = true;
            }
        }
        return modified;
 }


This implementation iterate over the collection, checking each element returned by the iterator in turn to see if it's contained in the specified collection.  If it's present the it is removed from this collection by using Iterator's remove method. Because of using contains() method, removeAll() performance goes into the range of O(n^2), which is an absolutely NO, especially if you are trying to reset a large ArrayList. Now let's see their performance in action to reset an ArrayList of just 100K entries.

 Removing all elements from ArrayList with 100K Objects

 The removeAll(Collection c) are taking 10000 times more time than clear to reset. Actually purpose of clear() and removeAll(Collection c) are different in API, clear() method is meant to reset a Collection by removing all elements, while removeAll(Collection c) only removes elements which are present in supplied collection. This method is not designed to remove all elements from a Collection. So, if your intention is to delete all elements from a Collection, then use clear(), while if you want to remove only some elements, which are present in another Collection, e.g. list of closed orders, than use removeAll() method .

import java.util.ArrayList;


public class ArrayListResetTest {
    private static final int SIZE = 100_000;
    public static void main(String args[]) {
    
        // Two ArrayList for clear and removeAll
        ArrayList numbers = new ArrayList(SIZE);
        ArrayList integers = new ArrayList(SIZE);
       
        // Initialize ArrayList with 10M integers
        for (int i = 0; i < SIZE; i++) {
            numbers.add(new Integer(i));
            integers.add(new Integer(i));
        }
    
        // Empty ArrayList using clear method
        long startTime = System.nanoTime();
        numbers.clear();
        long elapsed = System.nanoTime() - startTime;
        System.out.println("Time taken by clear to empty ArrayList of 1M elements (ns): " + elapsed);
    

       // Reset ArrayList using removeAll method
        startTime = System.nanoTime();
        integers.removeAll(integers);
        long time = System.nanoTime() - startTime;
        System.out.println("Time taken by removeAll to reset ArrayList of 1M elements (ns): " + time);
    }
}

Output:
Time taken by clear to empty ArrayList of 100000 elements (ns): 889619
Time taken by removeAll to reset ArrayList of 100000 elements (ns): 36633112126


Make sure you provide sufficient memory to run this program because it's uses two ArrayList to store Integers, especially if you want to compare the performance of clear() and removeAll() for List with 1M elements. You also need Java 7 to run this program because I am using underscore with the numeric literal feature. If you don't have JDK 7 then just remove underscores from SIZE constants, those are just for improving readability.

Read More 

Java 8 Features
How to Reset Arraylist In Java
How HashMap Work in Java
Why wait (), notify () and notifyAll () must be called from synchronized block or method in Java
XPath to locate Information in XML
Internals of Garbage Collector
Reference Type in Java
Different Ways to Create ObjectClass Loaders in Java
Producer Consumer Problem
Why String is Final in Java
Singleton Class using Enum
JSON tutorial
Exceptional Handling in Java