Java Set Explained: A Complete Guide to Unique Collections, HashSet, LinkedHashSet, and TreeSet

1. What Is a Set?

In Java programming, a Set is one of the most important collection types. The word “Set” comes from mathematics, and just like a mathematical set, it has the key characteristic that it cannot contain duplicate elements.
A Set is used when you want to manage only unique values, regardless of whether the data type is numbers, strings, or objects.

What Is the Difference Between Set and List?

The Java Collections Framework provides several data structures such as List and Map. Among them, Set and List are often compared. Their main differences are as follows:

  • List: Allows duplicate values and preserves element order (index-based).
  • Set: Does not allow duplicates, and element order is not guaranteed (except for certain implementations).

In short, a List is an “ordered collection,” while a Set is a “collection of unique elements.”
For example, if you want to manage user IDs without duplication, a Set is the ideal choice.

Advantages of Using Set

  • Automatic duplicate elimination Even when receiving a large amount of data from users, simply adding elements to a Set ensures that duplicates are stored only once. This eliminates the need for manual duplicate checks and simplifies implementation.
  • Efficient search and removal Sets are designed to perform fast existence checks and removal operations, although performance varies depending on the implementation (such as HashSet or TreeSet).

When Should You Use a Set?

  • When managing information that must not be duplicated, such as user email addresses or IDs
  • When data uniqueness must be guaranteed
  • When you want to efficiently create a list of unique values from a large dataset

As shown above, Set is the standard mechanism in Java for smartly handling collections that do not allow duplicates.
In the following sections, we will explore Set specifications, usage patterns, and concrete code examples in detail.

2. Basic Specifications and Benefits of Set

In Java, Set is defined by the java.util.Set interface. By implementing this interface, you can represent a collection of unique elements with no duplicates. Let’s take a closer look at the core specifications and advantages of Set.

Basic Characteristics of the Set Interface

A Set has the following characteristics:

  • No duplicate elements If you try to add an element that already exists, it will not be added. For example, even if you execute set.add("apple") twice, only one “apple” will be stored.
  • Order is not guaranteed (implementation-dependent) A Set does not guarantee element order by default. However, certain implementations such as LinkedHashSet and TreeSet manage elements in a specific order.
  • Handling of null elements Whether null is allowed depends on the implementation. For example, HashSet allows one null element, while TreeSet does not.

Importance of equals and hashCode

Whether two elements are considered duplicates in a Set is determined by the equals and hashCode methods.
When using custom classes as Set elements, failing to override these methods properly may cause unexpected duplicates or incorrect storage behavior.

  • equals: Determines whether two objects are logically equal
  • hashCode: Returns a numeric value used for efficient identification

Benefits of Using Set

Sets provide several practical advantages:

  • Easy duplicate elimination Simply adding values to a Set guarantees that duplicates are automatically removed, eliminating the need for manual checks.
  • Efficient search and removal Implementations such as HashSet provide fast lookup and removal operations, often outperforming Lists.
  • Simple and intuitive API Basic methods like add, remove, and contains make Sets easy to use.

Internal Implementation and Performance

One of the most common Set implementations, HashSet, internally uses a HashMap to manage elements. This allows element addition, removal, and lookup to be performed with average O(1) time complexity.
If ordering or sorting is required, you can choose implementations such as LinkedHashSet or TreeSet depending on your needs.

3. Major Implementation Classes and Their Characteristics

Java provides several major implementations of the Set interface. Each has different characteristics, so choosing the right one for your use case is important.
Here, we will explain the three most commonly used implementations: HashSet, LinkedHashSet, and TreeSet.

HashSet

HashSet is the most commonly used Set implementation.

  • Characteristics
  • Does not preserve element order (the insertion order and iteration order may differ).
  • Internally uses a HashMap, providing fast add, search, and remove operations.
  • Allows one null element.
  • Typical Use Cases
  • Ideal when you want to eliminate duplicates and order does not matter.
  • Sample Code
Set<String> set = new HashSet<>();
set.add("apple");
set.add("banana");
set.add("apple"); // Duplicate is ignored

for (String s : set) {
    System.out.println(s); // Only "apple" and "banana" are printed
}

LinkedHashSet

LinkedHashSet extends the functionality of HashSet by preserving insertion order.

  • Characteristics
  • Elements are iterated in the order they were inserted.
  • Internally managed using a combination of a hash table and a linked list.
  • Slightly slower than HashSet, but useful when order matters.
  • Typical Use Cases
  • Best when you want to remove duplicates while maintaining insertion order.
  • Sample Code
Set<String> set = new LinkedHashSet<>();
set.add("apple");
set.add("banana");
set.add("orange");

for (String s : set) {
    System.out.println(s); // Printed in order: apple, banana, orange
}

TreeSet

TreeSet is a Set implementation that automatically sorts elements.

  • Characteristics
  • Internally uses a Red-Black Tree (a balanced tree structure).
  • Elements are automatically sorted in ascending order.
  • Custom ordering is possible using Comparable or Comparator.
  • null values are not allowed.
  • Typical Use Cases
  • Useful when you need both uniqueness and automatic sorting.
  • Sample Code
Set<Integer> set = new TreeSet<>();
set.add(30);
set.add(10);
set.add(20);

for (Integer n : set) {
    System.out.println(n); // Printed in order: 10, 20, 30
}

Summary

  • HashSet: Best for high performance when order is not required
  • LinkedHashSet: Use when insertion order matters
  • TreeSet: Use when automatic sorting is required

Choosing the right Set implementation depends on your specific requirements. Select the most appropriate one and use it effectively.

4. Common Methods and How to Use Them

The Set interface provides various methods for collection operations. Below are the most commonly used methods, explained with examples.

Main Methods

  • add(E e) Adds an element to the Set. If the element already exists, it is not added.
  • remove(Object o) Removes the specified element from the Set. Returns true if successful.
  • contains(Object o) Checks whether the Set contains the specified element.
  • size() Returns the number of elements in the Set.
  • clear() Removes all elements from the Set.
  • isEmpty() Checks whether the Set is empty.
  • iterator() Returns an Iterator to traverse the elements.
  • toArray() Converts the Set to an array.

Basic Usage Example

Set<String> set = new HashSet<>();

// Add elements
set.add("apple");
set.add("banana");
set.add("apple"); // Duplicate ignored

// Get size
System.out.println(set.size()); // 2

// Check existence
System.out.println(set.contains("banana")); // true

// Remove element
set.remove("banana");
System.out.println(set.contains("banana")); // false

// Clear all elements
set.clear();
System.out.println(set.isEmpty()); // true

Iterating Over a Set

Since Set does not support index-based access (e.g., set.get(0)), use an Iterator or enhanced for-loop.

// Enhanced for-loop
Set<String> set = new HashSet<>();
set.add("A");
set.add("B");
set.add("C");

for (String s : set) {
    System.out.println(s);
}
// Using Iterator
Iterator<String> it = set.iterator();
while (it.hasNext()) {
    String s = it.next();
    System.out.println(s);
}

Important Notes

  • Adding an existing element using add does not change the Set.
  • Element order depends on the implementation (HashSet: unordered, LinkedHashSet: insertion order, TreeSet: sorted).

5. Common Use Cases and Typical Scenarios

Java Sets are widely used in many situations where duplicate values must be avoided. Below are some of the most common and practical use cases encountered in real-world development.

Creating a Unique List (Duplicate Removal)

When you want to extract only unique values from a large dataset, Set is extremely useful.
For example, it can automatically remove duplicates from user input or existing collections.

Example: Creating a Set from a List to Remove Duplicates

List<String> list = Arrays.asList("apple", "banana", "apple", "orange");
Set<String> set = new HashSet<>(list);

System.out.println(set); // [apple, banana, orange]

Ensuring Input Uniqueness

Sets are ideal for scenarios where duplicate values must not be registered, such as user IDs or email addresses.
You can immediately determine whether a value already exists by checking the return value of add.

Set<String> emailSet = new HashSet<>();
boolean added = emailSet.add("user@example.com");
if (!added) {
    System.out.println("This value is already registered");
}

Storing Custom Classes and Implementing equals/hashCode

When storing custom objects in a Set, proper implementation of equals and hashCode is essential.
Without them, objects with the same logical content may be treated as different elements.

Example: Ensuring Uniqueness in a Person Class

class Person {
    String name;

    Person(String name) {
        this.name = name;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        Person person = (Person) obj;
        return Objects.equals(name, person.name);
    }

    @Override
    public int hashCode() {
        return Objects.hash(name);
    }
}

// Example usage
Set<Person> people = new HashSet<>();
people.add(new Person("Taro"));
people.add(new Person("Taro")); // Without proper implementation, duplicates may occur
System.out.println(people.size()); // 1

Fast Lookup and Data Filtering

Because Set provides fast lookups via contains, it is often used for filtering and comparison tasks.
Converting a List to a Set can significantly improve performance when repeatedly checking for existence.

Example: Fast Keyword Lookup

Set<String> keywordSet = new HashSet<>(Arrays.asList("java", "python", "c"));
boolean found = keywordSet.contains("python"); // true

6. Performance Considerations and Pitfalls

While Set is a powerful collection for managing unique elements, improper usage can lead to unexpected behavior or performance issues. This section explains key performance characteristics and common pitfalls.

Performance Differences by Implementation

  • HashSet Uses a hash table internally, providing average O(1) performance for add, remove, and lookup operations. Performance may degrade if the number of elements becomes extremely large or if hash collisions occur frequently.
  • LinkedHashSet Similar performance to HashSet, but with additional overhead due to maintaining insertion order. In most cases, the difference is negligible unless handling very large datasets.
  • TreeSet Uses a Red-Black Tree internally, resulting in O(log n) performance for add, remove, and lookup operations. Slower than HashSet, but provides automatic sorting.

Using Mutable Objects as Set Elements

Extra caution is required when storing mutable objects in a Set.
HashSet and TreeSet rely on hashCode or compareTo values to manage elements.
If these values change after insertion, lookup and removal may fail.

Example: Pitfall with Mutable Objects

Set<Person> people = new HashSet<>();
Person p = new Person("Taro");
people.add(p);

p.name = "Jiro"; // Modifying after insertion
people.contains(p); // May return false unexpectedly

To avoid such issues, it is strongly recommended to use immutable objects as Set elements whenever possible.

Handling null Values

  • HashSet / LinkedHashSet: Allows one null element
  • TreeSet: Does not allow null (throws NullPointerException)

Other Important Notes

  • Modification during iteration Modifying a Set while iterating over it may cause a ConcurrentModificationException. Use Iterator.remove() instead of modifying the Set directly.
  • Choosing the right implementation Use LinkedHashSet or TreeSet when order matters. HashSet does not guarantee order.

7. Comparison Chart (Overview)

The table below summarizes the differences between major Set implementations for easy comparison.

ImplementationNo DuplicatesOrder PreservedSortedPerformancenull AllowedTypical Use Case
HashSetYesNoNoFast (O(1))One allowedDuplicate removal, order not required
LinkedHashSetYesYes (Insertion order)NoSlightly slower than HashSetOne allowedDuplicate removal with order preservation
TreeSetYesNoYes (Automatic)O(log n)Not allowedDuplicate removal with sorting

Key Takeaways

  • HashSet: The default choice when order is irrelevant and performance is critical.
  • LinkedHashSet: Best when insertion order must be preserved.
  • TreeSet: Ideal when automatic sorting is required.

8. Frequently Asked Questions (FAQ)

Q1. Can primitive types (int, char, etc.) be used in a Set?

A1. No. Use wrapper classes such as Integer or Character instead.

Q2. What happens if the same value is added multiple times?

A2. Only the first insertion is stored. The add method returns false if the element already exists.

Q3. When should I use List vs Set?

A3. Use List when order or duplicates matter, and Set when uniqueness is required.

Q4. What is required to store custom objects in a Set?

A4. Properly override equals and hashCode.

Q5. How can I preserve insertion order?

A5. Use LinkedHashSet.

Q6. How can I sort elements automatically?

A6. Use TreeSet.

Q7. Can Set contain null values?

A7. HashSet and LinkedHashSet allow one null; TreeSet does not.

Q8. How do I get the size of a Set?

A8. Use size().

Q9. How can I convert a Set to a List or array?

A9.

  • To array: toArray()
  • To List: new ArrayList<>(set)

Q10. Can I remove elements while iterating?

A10. Yes, but only using Iterator.remove().

9. Conclusion

This article covered Java Set collections from fundamentals to advanced usage. Key points include:

  • Set is designed to manage collections of unique elements, making it ideal for duplicate elimination.
  • Major implementations include HashSet (fast, unordered), LinkedHashSet (insertion order), and TreeSet (sorted).
  • Common use cases include duplicate removal, uniqueness checks, managing custom objects, and fast lookups.
  • Understanding performance characteristics and pitfalls such as mutable objects and iteration rules is essential.
  • The comparison table and FAQ provide practical guidance for real-world development.

Mastering Set collections makes Java programming cleaner, safer, and more efficient.
Next, consider combining Sets with Lists or Maps to build more advanced data structures and solutions.