HashMap is one of the most frequently used collections in Java. It provides an efficient way to store and retrieve data in a key-value format. In this article, we will take a closer look at how HashMap works internally in Java.
Understanding Hashing
Before we dive into how HashMap works, we need to understand the concept of hashing. Hashing is the process of mapping a large data set of variable size to a smaller data set of a fixed size. In the case of HashMap, it involves mapping keys to indices in an underlying array.
Hash Function
The first step in using a HashMap is to define a hash function. The hash function takes a key as input and returns an integer that represents the index in the underlying array where the value associated with the key will be stored.
In Java, the hash function is implemented by the hashCode() method. The hashCode() method is defined in the Object class and can be overridden by subclasses to provide a custom implementation.
For example, let’s consider a class called Person that has two fields: name and age. We can override the hashCode() method to generate a hash code based on the name field as follows:
public class Person {
private String name;
private int age;
@Override
public int hashCode() {
return name.hashCode();
}
}
This implementation generates a hash code based on the name field of the Person object.
Collisions
In some cases, different keys may have the same hash code. This is known as a collision. Collisions can occur due to the finite size of the underlying array and the fact that there are an infinite number of possible keys.
When a collision occurs, the HashMap stores the key-value pairs in a linked list at the corresponding index in the underlying array. The linked list is called a bucket, and each bucket can store multiple key-value pairs.
Internal Structure of a HashMap
A HashMap in Java is implemented using an array of linked lists. The size of the array is determined by the initial capacity of the HashMap, which is specified when the HashMap is created.
When a key-value pair is added to the HashMap, the key is first passed through the hash function to generate an index in the underlying array. If there is no collision, the key-value pair is added to the corresponding bucket at the index in the array.
If there is a collision, the key-value pair is added to the end of the linked list in the corresponding bucket. When retrieving a value from the HashMap, the key is first passed through the hash function to determine the index in the underlying array. If there is no collision, the value at the corresponding index in the array is returned. If there is a collision, the linked list in the corresponding bucket is searched until the key is found.
Resizing
As key-value pairs are added to a HashMap, the size of the underlying array can become too small, resulting in frequent collisions and degraded performance. To avoid this issue, the HashMap automatically resizes its underlying array when the number of elements exceeds a certain threshold. This threshold is determined by the load factor of the HashMap, which is specified when the HashMap is created.
When resizing occurs, a new underlying array with a larger size is created, and all key-value pairs are rehashed and added to the new array. This process can be expensive in terms of time and memory usage, so it is important to choose an appropriate initial capacity and load factor to minimize the frequency of resizing.
Conclusion
HashMap is a powerful and widely used data structure in Java. Understanding how HashMap works internally can help developers optimize their use of this collection and avoid common performance pitfalls. By choosing an appropriate hash function, initial capacity, and load factor, developers can ensure that their applications perform well and can efficiently store and retrieve data in a key-value format. Additionally, understanding the internal structure of a HashMap can help developers avoid common issues such as collisions and resizing.
It is worth noting that there are other implementations of the Map interface in Java, such as TreeMap and LinkedHashMap. Each of these implementations has its own advantages and disadvantages, and the choice of which one to use depends on the specific requirements of the application.
In summary, a HashMap in Java is a collection that stores data in key-value pairs and uses a hash function to map keys to indices in an underlying array. Collisions can occur when different keys have the same hash code, and in these cases, the key-value pairs are stored in a linked list in the corresponding bucket. The HashMap automatically resizes its underlying array to maintain a reasonable balance between memory usage and performance. By understanding how HashMap works internally, developers can use this powerful data structure to optimize their applications and improve performance.