Java Collections - Part 4: Aggregate Operations
This article is a part of the Java Programming Language article series.
JDK 8 and later, the preferred method of iterating over a collection is to obtain a stream and perform aggregate operations on it.
A for-each loop
The aggregate operation forEach
Pipeline
- A sequence of aggregate operations
Pipeline Components
- A source
- Zero or more intermediate operations
- A terminal operation
A source
- Collection, Array, IO channel
Zero or more intermediate operations
- Produces a new stream
The filter method
- Returns a new stream that contains elements that match its predicate
Returns a stream that contains people with last name starts with A - predicate: operation’s parameter, lambda expression
person -> person.getLastName().startsWith(“A”)
Returns the boolean value true if it matches
A terminal operation
- Produces a non-stream result
- A primitive value: double
- Collection
- forEach: no value
Stream
- A sequence of elements
- Unlike a collection, not a data structure that stores elements
- Carries values from a source through a pipeline
- The stream method
The aggregate operations filter and forEach
A for-each loop
Differences Between Aggregate Operations and Iterators
They use internal iteration
- Iterates over the elements of a collection sequentially
They process elements from a stream
- Not directly from a collection
They support behavior as parameters
- For most aggregate operations, parameters can be specified as lambda expressions
Internal iteration
- Application determines what collection it iterates
- The JDK determines how to iterate the collection
- Parallel computing by dividing a problem into subproblems
External iteration
- Application determines both what collection it iterates and how it iterates it
- Iterates over the elements of a collection sequentially
Reduction
- Terminal operations
Finding the average of values
Grouping elements into categories - Return one value by combining the contents of a stream: average, sum, min, max, and count
- Return a collection
- General-purpose operations: reduce and collect
The reduce method
The reduce method arguments
identity
- Initial value of the reduction
- Default result if there are no elements in the stream
accumulator
- Has two parameters: a, b
- a: partial result of the reduction
The sum of all processed integers so far - b: next element of the stream
An integer
Returns a new partial result
The reduce operation always returns a new value
The accumulator function also returns a new value every time it processes an element of a stream
To reduce the elements of a stream to a more complex object, use the collect method
The collect method
- Modifies or mutates an existing value when it processes an element
- Returns one value
The collect method arguments
supplier
- Factory function
- Constructs new instances of the result container
- A lambda expression or a method reference as opposed to a value like the identity element in the reduce operation
accumulator
- Incorporates a stream element into a result container
- Does not return a value
combiner
- Merges the contents of two result containers
- Does not return a value
- The JDK creates a new thread whenever the combiner function creates a new object with a parallel stream
Do not worry about synchronization
The groupingBy method arguments
A classification function
An instance of Collector
- Downstream collector
- Applies to the results of another collector
- Multilevel reduction: a pipeline that contains one or more downstream collectors
The reducing method arguments
identity
- Initial value of the reduction
- Default result if there are no elements in the stream
mapper
- The reducing operation applies this mapper function to all stream elements
operation
- The operation function is used to reduce the mapped values
Parallelism
- Dividing a problem into subproblems
- Solving these subproblems in parallel
- With each subproblem running in a separate thread
- Combining the results
Collections are not thread-safe
Multiple threads cannot manipulate a collection without introducing thread interference or memory consistency errors
- The Collections Framework provides synchronization wrappers
- Adds automatic synchronization to an arbitrary collection
- Makes it thread-safe
Executing Streams in Parallel
- Collection.parallelStream
- BaseStream.parallel
Concurrent Reduction
The Java runtime performs a concurrent reduction if all of the the following are true for a particular pipeline that contains the collect operation
- The stream is parallel
- The parameter of the collect operation, the collector, has the characteristic Collector.Characteristics.CONCURRENT
Invoke the Collector.characteristics method to determine - Either the stream is unordered, or the collector has the characteristic Collector.Characteristics.UNORDERED
Invoke the BaseStream.unordered operation to ensure
Ordering
- The order depends on whether the stream is executed in serial or in parallel
You may lose the benefits of parallelism if you use operations like forEachOrdered with parallel streams
Side Effects
- Laziness
- Interference
- Stateful Lambda Expressions
Laziness
All intermediate operations are lazy
- An expression, method, or algorithm is lazy if its value is evaluated only when it is required
- Intermediate operations do not start processing the contents of the stream until the terminal operation commences
In a pipeline such as the filter-mapToInt-average
- The average operation could obtain the first several integers from the stream created by the mapToInt operation, which obtains elements from the filter operation
- The average operation would repeat this process until it had obtained all required elements from the stream
- Then it would calculate the average
Interference
- Lambda expressions in stream operations should not interfere
- Interference occurs when the source of a stream is modified while a pipeline processes the stream
- Throws a ConcurrentModificationException
The pipeline begins execution when the operation get is invoked and ends execution when the get operation completes
Stateful Lambda Expressions
- A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline
- Avoid using stateful lambda expressions as parameters in stream operations
The lambda expression e -> { parallelStorage.add(e); return e; } is a stateful lambda expression
Its result can vary every time the code is run
- Customize the behavior of a particular aggregate operation