Tuesday, April 4, 2017

String Deduplication in Java 8

Java 8 update 20 has introduced a new feature called "String deduplication" which can be used to save memory from duplicate String object in Java application, which can improve the performance of your Java application and prevent java.lang.OutOfMemoryError if your application makes heavy use of String. If you have profiled a Java application to check which object is taking the bulk of memory, you will often find char[] object at the top of the list, which is nothing but internal character array used by String object.

Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse

The String deduplication is trying to bridge that gap. It reduces the memory footprint of String object on the Java Heap space by taking advantage of the fact that many String objects are identical. Instead of each String object pointing to their own character array, identical String object can point to the same character array.

Btw, this is not exactly same as it was before Java 7 update 6, where substring also points to the same character array, but can greatly reduce memory occupied by duplicate String in JVM. Anyway, In this article, you will see how you can enable this feature in Java 8 to reduce memory consumed by duplicate String objects.

How to enable String deduplication in Java 8

String deduplication is not enabled by default in Java 8 JVM. You can enable String deduplication feature by using -XX:+UseStringDeduplication option. Unfortunately, String deduplication is only available for the G1 garbage collector, so if you are not using G1 GC then you cannot use the String deduplication feature. It means just providing -XX:+UseStringDeduplication will not work, you also need to turn on G1 garbage collector using -XX:+UseG1GC option.

String deduplication also doesn't consider relatively young String for processing. The minimal age of processed String is controlled by -XX:StringDeduplicationAgeThreshold=3 option. The default value of this parameter is 3.

Important points

Here are some of the important points about String deduplication feature of Java 8:

1) This option is only available from Java 8 Update 20 JDK release.
2) This feature will only work along with G1 garbage collector, it will not work with other garbage collectors e.g. Concurrent Mark Sweep GC.
3) You need to provide both -XX:+UseG1GC and -XX:+StringDeduplication JVM options to enable this feature, first one will enable the G1 garbage collector and the second one will enable the String deduplication feature within G1 GC.
4) You can optionally use -XX:+PrintStringDeduplicationStatistics JVM option to analyze what is happening through the command-line.
5) Not every String is eligible for deduplication, especially young String objects are not visible, but you can control this by using  -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.
6) It is observed in general this feature may decrease heap usage by about 10%, which is very good, considering you don't have to do any coding or refactoring.
7) String deduplication runs as a background task without stopping your application.

Wednesday, February 22, 2017

Types of NoSQL Database/Datastore


Wide Row - Also known as wide-column stores, these databases store data in rows
and users are able to perform some query operations via column-based access. A
wide-row store offers very high performance and a highly scalable architecture.
Examples include: Cassandra, HBase, and Google BigTable.


Columnar - Also known as column oriented store. Here the columns of all the rows
are stored together on disk. A great fit for analytical queries because it reduces disk
seek and encourages array like processing. Amazon Redshift, Google BigQuery,
Teradata (with column partitioning).


Key/Value - These NoSQL databases are some of the least complex as all of the data
within consists of an indexed key and a value. Examples include Amazon DynamoDB,
Riak, and Oracle NoSQL database


Document - Expands on the basic idea of key-value stores where "documents" are
more complex, in that they contain data and each document is assigned a unique
key, which is used to retrieve the document. These are designed for storing,
retrieving, and managing document-oriented information, also known as semistructured data. Examples include MongoDB and CouchDB


Graph - Designed for data whose relationships are well represented as a graph
structure and has elements that are interconnected; with an undetermined number of
relationships between them. Examples include: Neo4J, OrientDB and TitanDB

Tuesday, February 21, 2017

Why PermGen has been removed from Java 8 ?


The Java Virtual Machine (JVM) uses an internal representation of its classes containing per-class metadata such as class hierarchy information, method data and information (such as bytecodes, stack and variable sizes), the runtime constant pool and resolved symbolic reference and Vtables.
In the past (when custom class loaders weren’t that common), the classes were mostly “static” and were infrequently unloaded or collected, and hence were labeled “permanent”. Also, since the classes are a part of the JVM implementation and not created by the application they are considered “non-heap” memory.
For HotSpot JVM prior to JDK8, these “permanent” representations would live in an area called the “permanent generation”. This permanent generation was contiguous with the Java heap and was limited to -XX:MaxPermSize that had to be set on the command line before starting the JVM or would default to 64M (85M for 64bit scaled pointers). The collection of the permanent generation would be tied to the collection of the old generation, so whenever either gets full, both the permanent generation and the old generation would be collected. One of the obvious problems that you may be able to call out right away is the dependency on the ‑XX:MaxPermSize. If the classes metadata size is beyond the bounds of ‑XX:MaxPermSize, your application will run out of memory and you will encounter an OOM (Out of Memory) error.

Following are the drawbacks in PermGen
  • Fixed size at startup – difficult to tune.
  • Internal Hotspot types were Java objects : Could move with full GC, opaque, not strongly typed and hard to debug, needed meta-metadata.
  • Simplify full collections : Special iterators for metadata for each collector
  • Want to deallocate class data concurrently and not during GC pause
  • Enable future improvements that were limited by PermGen.
The Permanent Generation (PermGen) space has completely been removed and is kind of replaced by a new space called Metaspace. The consequences of the PermGen removal is that obviously the PermSize and MaxPermSize JVM arguments are ignored and you will never get a java.lang.OutOfMemoryError: PermGen error. PermGen In Java 8 it was removed and replaced by area called Metaspace.
Advantages of MetaSpace
  • Take advantage of Java Language Specification property : Classes and associated metadata lifetimes match class loader’s
  • Per loader storage area – Metaspace
  • Linear allocation only
  • No individual reclamation (except for RedefineClasses and class loading failure)
  • No GC scan or compaction
  • No relocation for metaspace objects

Monday, August 8, 2016

Java Code for Converting Speech to Text

Jars Required

1) cmudict04.jar
2) cmulex.jar
3) cmutimelex.jar
4) cmu_time_awb.jar
5) cmu_us_kal.jar
6) en_us.jar
7) reetts-jsapi10.jar
8) freetts.jar
9) mbrola.jar

Code for Text to Speech

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;

public class TextToSpeech {
    // Default voice is Kevin16
    private static final String VOICENAME = "kevin16";
  
    public static void main(String[] args) {
    Voice voice;
    // Taking instance of voice from VoiceManager factory.
    VoiceManager voiceManager = VoiceManager.getInstance();
    voice = voiceManager.getVoice(VOICENAME);
    // Allocating voice
    voice.allocate();
    // word per minute
    voice.setRate(120);
    voice.setPitch(100);
    System.out.print("Enter your text: ");
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    try {
    // Ready to speak
    voice.speak(br.readLine());
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }   
   }

Saturday, July 30, 2016

How to decide whether to use newCachedThreadPool or newFixedThreadPool?

newFixedThreadPool()

Creates a thread pool that reuses a fixed number of threads...
This means that if you ask for 2 threads, it will start 2 threads and never start 3.
A FixedThreadPool does have its advantages when you do in fact want to work with a fixed number of threads, since then you can submit any number of tasks to the executor service while knowing that the number of threads will be maintained at the level you specified. If you explicitly want to grow the number of threads, then this is not the appropriate choice.

This does however mean that the one issue that you may have with the CachedThreadPool is in regards to limiting the number of threads that are running concurrently. The CachedThreadPool will not limit them for you, so you may need to write your own code to ensure that you do not run too many threads. This really depends on the design of your application and how tasks are submitted to the executor service.



On the other hand, the newCachedThreadPool()

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available.
In your case, if you only have 2 thread to run, either will work fine since you will only be submitting 2 jobs to your pool. However, if you wanted to submit all 20 jobs at once but only have 2 jobs running at one time, you should use a newFixedThreadPool(2). Otherwise all 20 jobs will run at the same time which may not be optimal depending on how many CPUs you have.

Typically  use the newCachedThreadPool() when you need the thread to be spawned immediately, even if all of the threads currently running are busy. I recently used it when I was spawning timer tasks. The number of concurrent jobs are immaterial because I never spawn very many but I want them to run when they are requested and I want them to re-use dormant threads.

Use newFixedThreadPool() when you want to limit the number of concurrent tasks running at any one point to maximize performance and not swamp the server. For example if I am processing 100k lines from a file, one line at a time, I don't want each line to start a new thread but I want some level of concurrency so I allocate (for example) 10 fixed threads to run the tasks until the pool is exhausted.