Java Collections - Part 4: Aggregate Operations

Gökhan Kanber
4 min readFeb 16, 2021
Andromeda

This article is a part of the Java Programming Language article series.

JDK 8 and later, the preferred method of iterating over a collection is to obtain a stream and perform aggregate operations on it.

A for-each loop

The aggregate operation forEach

Pipeline

  • A sequence of aggregate operations

Pipeline Components

  • A source
  • Zero or more intermediate operations
  • A terminal operation

A source

  • Collection, Array, IO channel

Zero or more intermediate operations

  • Produces a new stream

The filter method

  • Returns a new stream that contains elements that match its predicate
    Returns a stream that contains people with last name starts with A
  • predicate: operation’s parameter, lambda expression
    person -> person.getLastName().startsWith(“A”)
    Returns the boolean value true if it matches

A terminal operation

  • Produces a non-stream result
  • A primitive value: double
  • Collection
  • forEach: no value

Stream

  • A sequence of elements
  • Unlike a collection, not a data structure that stores elements
  • Carries values from a source through a pipeline
  • The stream method

The aggregate operations filter and forEach

A for-each loop

Differences Between Aggregate Operations and Iterators

They use internal iteration

  1. Iterates over the elements of a collection sequentially

They process elements from a stream

  • Not directly from a collection

They support behavior as parameters

  • For most aggregate operations, parameters can be specified as lambda expressions

Internal iteration

  • Application determines what collection it iterates
  • The JDK determines how to iterate the collection
  • Parallel computing by dividing a problem into subproblems

External iteration

  • Application determines both what collection it iterates and how it iterates it
  • Iterates over the elements of a collection sequentially

Reduction

  • Terminal operations
    Finding the average of values
    Grouping elements into categories
  • Return one value by combining the contents of a stream: average, sum, min, max, and count
  • Return a collection
  • General-purpose operations: reduce and collect

The reduce method

The reduce method arguments

identity

  • Initial value of the reduction
  • Default result if there are no elements in the stream

accumulator

  • Has two parameters: a, b
  • a: partial result of the reduction
    The sum of all processed integers so far
  • b: next element of the stream
    An integer
    Returns a new partial result

The reduce operation always returns a new value

The accumulator function also returns a new value every time it processes an element of a stream

To reduce the elements of a stream to a more complex object, use the collect method

The collect method

  • Modifies or mutates an existing value when it processes an element
  • Returns one value

The collect method arguments

supplier

  • Factory function
  • Constructs new instances of the result container
  • A lambda expression or a method reference as opposed to a value like the identity element in the reduce operation

accumulator

  • Incorporates a stream element into a result container
  • Does not return a value

combiner

  • Merges the contents of two result containers
  • Does not return a value
  • The JDK creates a new thread whenever the combiner function creates a new object with a parallel stream
    Do not worry about synchronization

The groupingBy method arguments

A classification function

An instance of Collector

  • Downstream collector
  • Applies to the results of another collector
  • Multilevel reduction: a pipeline that contains one or more downstream collectors

The reducing method arguments

identity

  • Initial value of the reduction
  • Default result if there are no elements in the stream

mapper

  • The reducing operation applies this mapper function to all stream elements

operation

  • The operation function is used to reduce the mapped values

Parallelism

  • Dividing a problem into subproblems
  • Solving these subproblems in parallel
  • With each subproblem running in a separate thread
  • Combining the results

Collections are not thread-safe

Multiple threads cannot manipulate a collection without introducing thread interference or memory consistency errors

  • The Collections Framework provides synchronization wrappers
  • Adds automatic synchronization to an arbitrary collection
  • Makes it thread-safe

Executing Streams in Parallel

  • Collection.parallelStream
  • BaseStream.parallel

Concurrent Reduction

The Java runtime performs a concurrent reduction if all of the the following are true for a particular pipeline that contains the collect operation

  • The stream is parallel
  • The parameter of the collect operation, the collector, has the characteristic Collector.Characteristics.CONCURRENT
    Invoke the Collector.characteristics method to determine
  • Either the stream is unordered, or the collector has the characteristic Collector.Characteristics.UNORDERED
    Invoke the BaseStream.unordered operation to ensure

Ordering

  • The order depends on whether the stream is executed in serial or in parallel

You may lose the benefits of parallelism if you use operations like forEachOrdered with parallel streams

Side Effects

  • Laziness
  • Interference
  • Stateful Lambda Expressions

Laziness

All intermediate operations are lazy

  • An expression, method, or algorithm is lazy if its value is evaluated only when it is required
  • Intermediate operations do not start processing the contents of the stream until the terminal operation commences

In a pipeline such as the filter-mapToInt-average

  • The average operation could obtain the first several integers from the stream created by the mapToInt operation, which obtains elements from the filter operation
  • The average operation would repeat this process until it had obtained all required elements from the stream
  • Then it would calculate the average

Interference

  • Lambda expressions in stream operations should not interfere
  • Interference occurs when the source of a stream is modified while a pipeline processes the stream
  • Throws a ConcurrentModificationException

The pipeline begins execution when the operation get is invoked and ends execution when the get operation completes

Stateful Lambda Expressions

  • A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline
  • Avoid using stateful lambda expressions as parameters in stream operations

The lambda expression e -> { parallelStorage.add(e); return e; } is a stateful lambda expression

Its result can vary every time the code is run

  • Customize the behavior of a particular aggregate operation

--

--