Thursday, June 29, 2017
Java Code to check if two binary trees are identical or not
Java Code to check if two binary trees are identical or not
The idea is to traverse both trees and compare value at their root node. If the value matches, we recursively check if left subtree of first tree is identical to left subtree of second tree and right subtree of first tree is identical to right subtree of second tree. If the value at their root node differs, the trees violates data property. If at any point in the recursion, the first tree is empty & second tree is non-empty or second tree is empty & first tree is non-empty, the trees violates structural property and they cannot be identical.
// Java program to see if two trees are identical
// A binary tree node
class Node
{
int data;
Node left, right;
Node(int item)
{
data = item;
left = right = null;
}
}
class BinaryTree
{
Node root1, root2;
/* Given two trees, return true if they are structurally identical */
boolean identicalTrees(Node a, Node b)
{
/*1. both empty */
if (a == null && b == null)
return true;
/* 2. both non-empty -> compare them */
if (a != null && b != null)
return (a.data == b.data
identicalTrees(a.left, b.left)
identicalTrees(a.right, b.right));
/* 3. one empty, one not -> false */
return false;
}
/* Driver program to test identicalTrees() function */
public static void main(String[] args)
{
BinaryTree tree = new BinaryTree();
tree.root1 = new Node(1);
tree.root1.left = new Node(2);
tree.root1.right = new Node(3);
tree.root1.left.left = new Node(4);
tree.root1.left.right = new Node(5);
tree.root2 = new Node(1);
tree.root2.left = new Node(2);
tree.root2.right = new Node(3);
tree.root2.left.left = new Node(4);
tree.root2.left.right = new Node(5);
if (tree.identicalTrees(tree.root1, tree.root2))
System.out.println("Both trees are identical");
else
System.out.println("Trees are not identical");
}
}
// A binary tree node
class Node
{
int data;
Node left, right;
Node(int item)
{
data = item;
left = right = null;
}
}
class BinaryTree
{
Node root1, root2;
/* Given two trees, return true if they are structurally identical */
boolean identicalTrees(Node a, Node b)
{
/*1. both empty */
if (a == null && b == null)
return true;
/* 2. both non-empty -> compare them */
if (a != null && b != null)
return (a.data == b.data
identicalTrees(a.left, b.left)
identicalTrees(a.right, b.right));
/* 3. one empty, one not -> false */
return false;
}
/* Driver program to test identicalTrees() function */
public static void main(String[] args)
{
BinaryTree tree = new BinaryTree();
tree.root1 = new Node(1);
tree.root1.left = new Node(2);
tree.root1.right = new Node(3);
tree.root1.left.left = new Node(4);
tree.root1.left.right = new Node(5);
tree.root2 = new Node(1);
tree.root2.left = new Node(2);
tree.root2.right = new Node(3);
tree.root2.left.left = new Node(4);
tree.root2.left.right = new Node(5);
if (tree.identicalTrees(tree.root1, tree.root2))
System.out.println("Both trees are identical");
else
System.out.println("Trees are not identical");
}
}
What are Microservices?
The idea behind microservices is that some types of applications become easier to build and maintain when they are broken down into smaller, composable pieces which work together. Each component is developed separately, and the application is then simply the sum of its constituent components. This is in contrast to a traditional, "monolithic" application which is all developed all in one piece.
There are many reasons why this approach is considered an easier way to develop large applications, particular enterprise applications, and various types of software as a service delivered over the Internet.
One of the reasons is from a project engineering perspective. When the different components of an application are separated, they can be developed concurrently. Another is resilience. Rather than relying upon a single virtual or physical machine, components can be spread around multiple severs or even multiple data centers. If a component dies, you spin up another, and the rest of the application can continue to function. It allows more efficient scaling, as rather than scaling up with bigger and more powerful machines, or just more copies of the entire application, you can scale out with duplicate copies of the heaviest-used parts..
Is this a new concept?
The idea of separating applications into smaller parts is nothing new; there are other programming paradigms which address this same concept, such as Service Oriented Architecture (SOA). What may be new are some of the tools and techniques used to deliver on the promise of microservices.
The common definition of microservices generally relies upon each microservice providing an API endpoint, often but not always a stateless REST API which can be accessed over HTTP(S) just like a standard webpage. This method for accessing microservices make them easy for developers to consume as they only require tools and methods many developers are already familiar with.
Microservices depend not just on the technology being set up to support this concept, but on an organization having the culture, know-how, and structures in place for development teams to be able to adopt this model. Microservices are a part of a larger shift in IT departments towards a DevOps culture, in which development and operations teams work closely together to support an application over its lifecycle, and go through a rapid or even continuous release cycle rather than a more traditional long cycle.
Why is open source important for microservices?
When you design your applications from the ground up to be modular and composable, it allows you to use drop-in components in many places where in the past you may have required proprietary solutions, either because the licensing of the components, or specialized requirements. Many application components can be off-the-shelf open source tools.
A focus on microservices may also make it easier for application developers to offer alternative interfaces to your applications. When everything is an API, communications between application components become standardized. All a component has to do to make use of your application and data is to be able to authenticate and communicate across those standard APIs. This allows both those inside and, when appropriate, outside your organization to easily develop new ways to utilize your application's data and services.
Where do Docker and container technologies come in?
Many people see Docker or other container technologies as enablers of a microservice architecture.
Unlike virtual machines, containers are designed to be pared down to the minimal viable pieces needed to run whatever the one thing the container is designed to do, rather than packing multiple functions into the same virtual or physical machine. The ease of development that Docker and similar tools provide help make possible rapid development and testing of services.
Of course, containers are just a tool, and microservice architecture is just a concept. So it is entirely possible to build an application which could be described as following a microservices approach without using containers, just as it would be possible to build a much more traditional application inside of a container (although this may not be a good idea).
How do you orchestrate microservices?
In order to actually run an application based on microservices, you need to be able monitor, manage, and scale the different constituent parts. There are a number of different tools that might allow you to accomplish this. For containers, open source tools like Kubernetes, Docker Swarm, or Apache projects like Mesos or ZooKeeper might be a part of your solution. Alternatively, for non-container pieces of an application, other tools may be used for orchestrating components: for example, in an OpenStack cloud you might use Heat for managing application components. Another option is to use a Platform as a Service (PaaS) tool, which lets developers focus on writing code by abstracting some of the underlying orchestration technology and allowing them to easily select off-the-shelf open source components for certain parts of an application, like a database storage engine, a logging service, a continuous integration server, web server, or other pieces of the puzzle. Some PaaS systems like OpenShift directly use upstream projects like Docker and Kubernetes for managing application components, while others try to re-implement management tools themselves.
What about existing applications?
While utilizing microservices may be an important component of an organization's IT strategy going forward, there are certainly many applications which don't meet this model, nor is it likely that those applications will be rearchitected overnight to meet this new paradigm. Microservices and traditional applications can work together in the same environments, provided the organization has a solid bi-modal IT strategy.
There are many reasons why this approach is considered an easier way to develop large applications, particular enterprise applications, and various types of software as a service delivered over the Internet.
One of the reasons is from a project engineering perspective. When the different components of an application are separated, they can be developed concurrently. Another is resilience. Rather than relying upon a single virtual or physical machine, components can be spread around multiple severs or even multiple data centers. If a component dies, you spin up another, and the rest of the application can continue to function. It allows more efficient scaling, as rather than scaling up with bigger and more powerful machines, or just more copies of the entire application, you can scale out with duplicate copies of the heaviest-used parts..
Is this a new concept?
The idea of separating applications into smaller parts is nothing new; there are other programming paradigms which address this same concept, such as Service Oriented Architecture (SOA). What may be new are some of the tools and techniques used to deliver on the promise of microservices.
The common definition of microservices generally relies upon each microservice providing an API endpoint, often but not always a stateless REST API which can be accessed over HTTP(S) just like a standard webpage. This method for accessing microservices make them easy for developers to consume as they only require tools and methods many developers are already familiar with.
Microservices depend not just on the technology being set up to support this concept, but on an organization having the culture, know-how, and structures in place for development teams to be able to adopt this model. Microservices are a part of a larger shift in IT departments towards a DevOps culture, in which development and operations teams work closely together to support an application over its lifecycle, and go through a rapid or even continuous release cycle rather than a more traditional long cycle.
Why is open source important for microservices?
When you design your applications from the ground up to be modular and composable, it allows you to use drop-in components in many places where in the past you may have required proprietary solutions, either because the licensing of the components, or specialized requirements. Many application components can be off-the-shelf open source tools.
A focus on microservices may also make it easier for application developers to offer alternative interfaces to your applications. When everything is an API, communications between application components become standardized. All a component has to do to make use of your application and data is to be able to authenticate and communicate across those standard APIs. This allows both those inside and, when appropriate, outside your organization to easily develop new ways to utilize your application's data and services.
Where do Docker and container technologies come in?
Many people see Docker or other container technologies as enablers of a microservice architecture.
Unlike virtual machines, containers are designed to be pared down to the minimal viable pieces needed to run whatever the one thing the container is designed to do, rather than packing multiple functions into the same virtual or physical machine. The ease of development that Docker and similar tools provide help make possible rapid development and testing of services.
Of course, containers are just a tool, and microservice architecture is just a concept. So it is entirely possible to build an application which could be described as following a microservices approach without using containers, just as it would be possible to build a much more traditional application inside of a container (although this may not be a good idea).
How do you orchestrate microservices?
In order to actually run an application based on microservices, you need to be able monitor, manage, and scale the different constituent parts. There are a number of different tools that might allow you to accomplish this. For containers, open source tools like Kubernetes, Docker Swarm, or Apache projects like Mesos or ZooKeeper might be a part of your solution. Alternatively, for non-container pieces of an application, other tools may be used for orchestrating components: for example, in an OpenStack cloud you might use Heat for managing application components. Another option is to use a Platform as a Service (PaaS) tool, which lets developers focus on writing code by abstracting some of the underlying orchestration technology and allowing them to easily select off-the-shelf open source components for certain parts of an application, like a database storage engine, a logging service, a continuous integration server, web server, or other pieces of the puzzle. Some PaaS systems like OpenShift directly use upstream projects like Docker and Kubernetes for managing application components, while others try to re-implement management tools themselves.
What about existing applications?
While utilizing microservices may be an important component of an organization's IT strategy going forward, there are certainly many applications which don't meet this model, nor is it likely that those applications will be rearchitected overnight to meet this new paradigm. Microservices and traditional applications can work together in the same environments, provided the organization has a solid bi-modal IT strategy.
Thursday, June 1, 2017
Data Structures - Asymptotic Analysis( Big O, Theta θ, Omega Ω Notation)
Asymptotic analysis of an algorithm refers to defining the mathematical boundation/framing of its run-time performance. Using asymptotic analysis, we can very well conclude the best case, average case, and worst case scenario of an algorithm.
Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to work in a constant time. Other than the "input" all other factors are considered constant.The main idea of asymptotic analysis is to have a measure of efficiency of algorithms that doesn’t depend on machine specific constants, and doesn’t require algorithms to be implemented and time taken by programs to be compared. Asymptotic notations are mathematical tools to represent time complexity of algorithms for asymptotic analysis.
Asymptotic analysis refers to computing the running time of any operation in mathematical units of computation. For example, the running time of one operation is computed as f(n) and may be for another operation it is computed as g(n2). This means the first operation running time will increase linearly with the increase in n and the running time of the second operation will increase exponentially when n increases. Similarly, the running time of both operations will be nearly the same if n is significantly small.
Usually, the time required by an algorithm falls under three types −
Best Case − Minimum time required for program execution.
Average Case − Average time required for program execution.
Worst Case − Maximum time required for program execution.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time complexity of an algorithm.
Ο Notation
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It measures the worst case time complexity or the longest amount of time an algorithm can possibly take to complete. The Big O notation defines an upper bound of an algorithm, it bounds a function only from above. For example, consider the case of Insertion Sort. It takes linear time in best case and quadratic time in worst case. We can safely say that the time complexity of Insertion sort is O(n^2). Note that O(n^2) also covers linear time.
If we use Θ notation to represent time complexity of Insertion sort, we have to use two statements for best and worst cases:
1. The worst case time complexity of Insertion Sort is Θ(n^2).
2. The best case time complexity of Insertion Sort is Θ(n).
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It measures the best case time complexity or the best amount of time an algorithm can possibly take to complete. Just as Big O notation provides an asymptotic upper bound on a function, Ω notation provides an asymptotic lower bound.Ω Notation can be useful when we have lower bound on time complexity of an algorithm.
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an algorithm's running time.The theta notation bounds a functions from above and below, so it defines exact asymptotic behavior.
A simple way to get Theta notation of an expression is to drop low order terms and ignore leading constants. For example, consider the following expression.
3n3 + 6n2 + 6000 = Θ(n3)
Dropping lower order terms is always fine because there will always be a n0 after which Θ(n3) has higher values than Θn2) irrespective of the constants involved.
Volatile vs Atomic in Java
Volatile and Atomic are two different concepts. Volatile ensures, that a certain, expected (memory) state is true across different threads, while Atomics ensure that operation on variables are performed atomically.
Take the following example of two threads in Java:
Thread A:
value = 1;
done = true;
Thread B:
if (done)
System.out.println(value);
Starting with value = 0 and done = false the rule of threading tells us, that it is undefined whether or not Thread B will print value. Furthermore value is undefined at that point as well! To explain this you need to know a bit about Java memory management (which can be complex), in short: Threads may create local copies of variables, and the JVM can reorder code to optimize it, therefore there is no guarantee that the above code is run in exactly that order. Setting done to true and then setting value to 1 would be a possible outcome of the JIT.
volatile only ensures, that at the moment of access of such a variable, the new value will be immediately visible to all other threads and the order of execution ensures, that the code is at the state you would expect it to be. So in case of the code above, defining done as volatile will ensure that whenever Thread B checks the variable, it is either false, or true, and if it is true, then value has been set to 1 as well.
As a side-effect of volatile, the value of such a variable is set thread-wide atomically (at a very minor cost of execution speed). This is however only important on 32-bit systems that i.E. use long (64-bit) variables (or similar), in most other cases setting/reading a variable is atomic anyways. But there is an important difference between an atomic access and an atomic operation. Volatile only ensures that the access is atomically, while Atomics ensure that the operation is atomically.
Take the following example:
i = i + 1;
No matter how you define i, a different Thread reading the value just when the above line is executed might get i, or i + 1, because the operation is not atomically. If the other thread sets i to a different value, in worst case i could be set back to whatever it was before by thread A, because it was just in the middle of calculating i + 1 based on the old value, and then set i again to that old value + 1. Explanation:
Assume i = 0
Thread A reads i, calculates i+1, which is 1
Thread B sets i to 1000 and returns
Thread A now sets i to the result of the operation, which is i = 1
Atomics like AtomicInteger ensure, that such operations happen atomically. So the above issue cannot happen, i would either be 1000 or 1001 once both threads are finished.
Take the following example of two threads in Java:
Thread A:
value = 1;
done = true;
Thread B:
if (done)
System.out.println(value);
Starting with value = 0 and done = false the rule of threading tells us, that it is undefined whether or not Thread B will print value. Furthermore value is undefined at that point as well! To explain this you need to know a bit about Java memory management (which can be complex), in short: Threads may create local copies of variables, and the JVM can reorder code to optimize it, therefore there is no guarantee that the above code is run in exactly that order. Setting done to true and then setting value to 1 would be a possible outcome of the JIT.
volatile only ensures, that at the moment of access of such a variable, the new value will be immediately visible to all other threads and the order of execution ensures, that the code is at the state you would expect it to be. So in case of the code above, defining done as volatile will ensure that whenever Thread B checks the variable, it is either false, or true, and if it is true, then value has been set to 1 as well.
As a side-effect of volatile, the value of such a variable is set thread-wide atomically (at a very minor cost of execution speed). This is however only important on 32-bit systems that i.E. use long (64-bit) variables (or similar), in most other cases setting/reading a variable is atomic anyways. But there is an important difference between an atomic access and an atomic operation. Volatile only ensures that the access is atomically, while Atomics ensure that the operation is atomically.
Take the following example:
i = i + 1;
No matter how you define i, a different Thread reading the value just when the above line is executed might get i, or i + 1, because the operation is not atomically. If the other thread sets i to a different value, in worst case i could be set back to whatever it was before by thread A, because it was just in the middle of calculating i + 1 based on the old value, and then set i again to that old value + 1. Explanation:
Assume i = 0
Thread A reads i, calculates i+1, which is 1
Thread B sets i to 1000 and returns
Thread A now sets i to the result of the operation, which is i = 1
Atomics like AtomicInteger ensure, that such operations happen atomically. So the above issue cannot happen, i would either be 1000 or 1001 once both threads are finished.
Does making all fields Final makes the class Immutable in Java?
One of the common misconceptions among many Java Programmer is that a class with all final fields automatically becomes Immutable. This is not correct, you can easily break immutability of certain class if the final field it contains is a mutable one, as we'll see in this article. One of the most common examples of this is a java.util.Date. You have to be extra cautious to keep your class' immutability intact with mutable fields. When you return a reference to a mutable object, you are sharing ownership of that reference with whoever receives it. This can break invariant, such as immutability.
Another example of this kind of pattern which can break immutability is returning collection or array from the getters method .
So, even though, the field which is pointing to Date or Collection or array object is final, you can still break the immutability of the class by breaking Encapsulation by returning a reference to the original mutable object.
There are two ways to avoid this problem, first, don't provide getters to mutable objects if you can avoid it. If you must, then consider returning a copy or clone of the mutable object. If you are returning a collection, you could wrap it as an unmodifiable collection. Since we cannot make an array final or unmodifiable in Java
Monday, May 1, 2017
How to make code Thread-Safe in Java
How to make code Thread-Safe in Java
There are multiple ways to make this code thread safe in Java:
1) Use synchronized keyword in Java and lock the getCount() method so that only one thread can execute it at a time which removes possibility of coinciding or interleaving.
2) use Atomic Integer, which makes this ++ operation atomic and since atomic operations are thread-safe and saves cost of external synchronization.
3) Immutable objects are by default thread-safe because there state can not be modified once created. Since String is immutable in Java, its inherently thread-safe.
4) Read only or final variables in Java are also thread-safe in Java.
5) Locking is one way of achieving thread-safety in Java.
6) Static variables if not synchronized properly becomes major cause of thread-safety issues.
7) Example of thread-safe class in Java: Vector, Hashtable, ConcurrentHashMap, String etc.
8) Atomic operations in Java are thread-safe e.g. reading a 32 bit int from memory because its an atomic operation it can't interleave with other thread.
9) local variables are also thread-safe because each thread has there own copy and using local variables is good way to writing thread-safe code in Java.
10) In order to avoid thread-safety issue minimize sharing of objects between multiple thread.
11) Volatile keyword in Java can also be used to instruct thread not to cache variables and read from main memory and can also instruct JVM not to reorder or optimize code from threading perspective.
Tuesday, April 4, 2017
Top 10 Coding guidelines in Java
The top 10 are as follows
1) Do not expose methods that use reduced-security checks to untrusted code. Certain methods use a reduced-security check that checks only that the calling method is authorized rather than checking every method in the call stack. Any code that invokes these methods must guarantee that they cannot be invoked on behalf of untrusted code.
2) Do not use the clone()method to copy untrusted method parameters. Inappropriate use of the clone() method can allow an attacker to exploit vulnerabilities by providing arguments that appear normal but subsequently return unexpected values. Such objects may consequently bypass validation and security checks.
3) Document thread-safety and use annotations where applicable. The Java language annotation facility is useful for documenting design intent. Source code annotation is a mechanism for associating metadata with a program element and making it available to the compiler, analyzers, debuggers, or Java Virtual Machine (JVM) for examination. Several annotations are available for documenting thread-safety.
4) Be aware of numeric promotion behavior. Promotions in which the operands are converted from an int to a float or from a long to a double can cause a loss of precision.
5) Use a try-with-resources statement to safely handle closeable resources. Using the try-with-resources statement prevents problems that can arise when closing resources with an ordinary try-catch-finally block, such as failing to close a resource because an exception is thrown as a result of closing another resource, or masking an important exception when a resource is closed.
6) Use the same type for the second and third operands in conditional expressions. The complexity of the rules that determine the result type of a conditional expression can result in unintended type conversions. Consequently, the second and third operands of each conditional expression should have identical types.
7) Avoid inadvertent wrapping of loop counters. Unless coded properly, a while or for loop may execute forever, or until the counter wraps around and reaches its final value.
8) Strive for logical completeness. Software vulnerabilities can result when a programmer fails to consider all possible data states.
9) Do not confuse abstract object equality with reference equality. Naïve programmers often confuse the intent of the == operation with that of the Object.equals() method. This confusion is frequently evident in the context of processing of String objects.
10) Understand how escape characters are interpreted when strings are loaded. Many classes allow inclusion of escape sequences in character and string literals. Correct use of escape sequences in string literals requires understanding how the escape sequences are interpreted by the Java compiler, as well as how they are interpreted by any subsequent processor.
1) Do not expose methods that use reduced-security checks to untrusted code. Certain methods use a reduced-security check that checks only that the calling method is authorized rather than checking every method in the call stack. Any code that invokes these methods must guarantee that they cannot be invoked on behalf of untrusted code.
2) Do not use the clone()method to copy untrusted method parameters. Inappropriate use of the clone() method can allow an attacker to exploit vulnerabilities by providing arguments that appear normal but subsequently return unexpected values. Such objects may consequently bypass validation and security checks.
3) Document thread-safety and use annotations where applicable. The Java language annotation facility is useful for documenting design intent. Source code annotation is a mechanism for associating metadata with a program element and making it available to the compiler, analyzers, debuggers, or Java Virtual Machine (JVM) for examination. Several annotations are available for documenting thread-safety.
4) Be aware of numeric promotion behavior. Promotions in which the operands are converted from an int to a float or from a long to a double can cause a loss of precision.
5) Use a try-with-resources statement to safely handle closeable resources. Using the try-with-resources statement prevents problems that can arise when closing resources with an ordinary try-catch-finally block, such as failing to close a resource because an exception is thrown as a result of closing another resource, or masking an important exception when a resource is closed.
6) Use the same type for the second and third operands in conditional expressions. The complexity of the rules that determine the result type of a conditional expression can result in unintended type conversions. Consequently, the second and third operands of each conditional expression should have identical types.
7) Avoid inadvertent wrapping of loop counters. Unless coded properly, a while or for loop may execute forever, or until the counter wraps around and reaches its final value.
8) Strive for logical completeness. Software vulnerabilities can result when a programmer fails to consider all possible data states.
9) Do not confuse abstract object equality with reference equality. Naïve programmers often confuse the intent of the == operation with that of the Object.equals() method. This confusion is frequently evident in the context of processing of String objects.
10) Understand how escape characters are interpreted when strings are loaded. Many classes allow inclusion of escape sequences in character and string literals. Correct use of escape sequences in string literals requires understanding how the escape sequences are interpreted by the Java compiler, as well as how they are interpreted by any subsequent processor.
String Deduplication in Java 8
Java 8 update 20 has introduced a new feature called "String deduplication" which can be used to save memory from duplicate String object in Java application, which can improve the performance of your Java application and prevent java.lang.OutOfMemoryError if your application makes heavy use of String. If you have profiled a Java application to check which object is taking the bulk of memory, you will often find char[] object at the top of the list, which is nothing but internal character array used by String object.
Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse
The String deduplication is trying to bridge that gap. It reduces the memory footprint of String object on the Java Heap space by taking advantage of the fact that many String objects are identical. Instead of each String object pointing to their own character array, identical String object can point to the same character array.
Btw, this is not exactly same as it was before Java 7 update 6, where substring also points to the same character array, but can greatly reduce memory occupied by duplicate String in JVM. Anyway, In this article, you will see how you can enable this feature in Java 8 to reduce memory consumed by duplicate String objects.
String deduplication also doesn't consider relatively young String for processing. The minimal age of processed String is controlled by -XX:StringDeduplicationAgeThreshold=3 option. The default value of this parameter is 3.
1) This option is only available from Java 8 Update 20 JDK release.
2) This feature will only work along with G1 garbage collector, it will not work with other garbage collectors e.g. Concurrent Mark Sweep GC.
3) You need to provide both -XX:+UseG1GC and -XX:+StringDeduplication JVM options to enable this feature, first one will enable the G1 garbage collector and the second one will enable the String deduplication feature within G1 GC.
4) You can optionally use -XX:+PrintStringDeduplicationStatistics JVM option to analyze what is happening through the command-line.
5) Not every String is eligible for deduplication, especially young String objects are not visible, but you can control this by using -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.
6) It is observed in general this feature may decrease heap usage by about 10%, which is very good, considering you don't have to do any coding or refactoring.
7) String deduplication runs as a background task without stopping your application.
Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse
The String deduplication is trying to bridge that gap. It reduces the memory footprint of String object on the Java Heap space by taking advantage of the fact that many String objects are identical. Instead of each String object pointing to their own character array, identical String object can point to the same character array.
Btw, this is not exactly same as it was before Java 7 update 6, where substring also points to the same character array, but can greatly reduce memory occupied by duplicate String in JVM. Anyway, In this article, you will see how you can enable this feature in Java 8 to reduce memory consumed by duplicate String objects.
How to enable String deduplication in Java 8
String deduplication is not enabled by default in Java 8 JVM. You can enable String deduplication feature by using -XX:+UseStringDeduplication option. Unfortunately, String deduplication is only available for the G1 garbage collector, so if you are not using G1 GC then you cannot use the String deduplication feature. It means just providing -XX:+UseStringDeduplication will not work, you also need to turn on G1 garbage collector using -XX:+UseG1GC option.String deduplication also doesn't consider relatively young String for processing. The minimal age of processed String is controlled by -XX:StringDeduplicationAgeThreshold=3 option. The default value of this parameter is 3.
Important points
Here are some of the important points about String deduplication feature of Java 8:1) This option is only available from Java 8 Update 20 JDK release.
2) This feature will only work along with G1 garbage collector, it will not work with other garbage collectors e.g. Concurrent Mark Sweep GC.
3) You need to provide both -XX:+UseG1GC and -XX:+StringDeduplication JVM options to enable this feature, first one will enable the G1 garbage collector and the second one will enable the String deduplication feature within G1 GC.
4) You can optionally use -XX:+PrintStringDeduplicationStatistics JVM option to analyze what is happening through the command-line.
5) Not every String is eligible for deduplication, especially young String objects are not visible, but you can control this by using -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.
6) It is observed in general this feature may decrease heap usage by about 10%, which is very good, considering you don't have to do any coding or refactoring.
7) String deduplication runs as a background task without stopping your application.
Wednesday, February 22, 2017
Types of NoSQL Database/Datastore
Wide Row - Also known as wide-column stores, these databases store data in rows
and users are able to perform some query operations via column-based access. A
wide-row store offers very high performance and a highly scalable architecture.
Examples include: Cassandra, HBase, and Google BigTable.
• Columnar - Also known as column oriented store. Here the columns of all the rows
are stored together on disk. A great fit for analytical queries because it reduces disk
seek and encourages array like processing. Amazon Redshift, Google BigQuery,
Teradata (with column partitioning).
• Key/Value - These NoSQL databases are some of the least complex as all of the data
within consists of an indexed key and a value. Examples include Amazon DynamoDB,
Riak, and Oracle NoSQL database
• Document - Expands on the basic idea of key-value stores where "documents" are
more complex, in that they contain data and each document is assigned a unique
key, which is used to retrieve the document. These are designed for storing,
retrieving, and managing document-oriented information, also known as semistructured data. Examples include MongoDB and CouchDB
• Graph - Designed for data whose relationships are well represented as a graph
structure and has elements that are interconnected; with an undetermined number of
relationships between them. Examples include: Neo4J, OrientDB and TitanDB
Tuesday, February 21, 2017
Why PermGen has been removed from Java 8 ?
The Java Virtual Machine (JVM) uses an internal representation of its classes containing per-class metadata such as class hierarchy information, method data and information (such as bytecodes, stack and variable sizes), the runtime constant pool and resolved symbolic reference and Vtables.
In the past (when custom class loaders weren’t that common), the classes were mostly “static” and were infrequently unloaded or collected, and hence were labeled “permanent”. Also, since the classes are a part of the JVM implementation and not created by the application they are considered “non-heap” memory.
For HotSpot JVM prior to JDK8, these “permanent” representations would live in an area called the “permanent generation”. This permanent generation was contiguous with the Java heap and was limited to -XX:MaxPermSize that had to be set on the command line before starting the JVM or would default to 64M (85M for 64bit scaled pointers). The collection of the permanent generation would be tied to the collection of the old generation, so whenever either gets full, both the permanent generation and the old generation would be collected. One of the obvious problems that you may be able to call out right away is the dependency on the ‑XX:MaxPermSize. If the classes metadata size is beyond the bounds of ‑XX:MaxPermSize, your application will run out of memory and you will encounter an OOM (Out of Memory) error.
Following are the drawbacks in PermGen
- Fixed size at startup – difficult to tune.
- Internal Hotspot types were Java objects : Could move with full GC, opaque, not strongly typed and hard to debug, needed meta-metadata.
- Simplify full collections : Special iterators for metadata for each collector
- Want to deallocate class data concurrently and not during GC pause
- Enable future improvements that were limited by PermGen.
The Permanent Generation (PermGen) space has completely been removed and is kind of replaced by a new space called Metaspace. The consequences of the PermGen removal is that obviously the PermSize and MaxPermSize JVM arguments are ignored and you will never get a java.lang.OutOfMemoryError: PermGen error. PermGen In Java 8 it was removed and replaced by area called Metaspace.
Advantages of MetaSpace
- Take advantage of Java Language Specification property : Classes and associated metadata lifetimes match class loader’s
- Per loader storage area – Metaspace
- Linear allocation only
- No individual reclamation (except for RedefineClasses and class loading failure)
- No GC scan or compaction
- No relocation for metaspace objects
Subscribe to:
Posts (Atom)