Lesson 4 - Working with Lists and Sets

1. Introduction

Lists and sets are fundamental to Haskell and functional programming in general. They allow us to manipulate collections of data in powerful, declarative ways. In this lesson, we'll explore how to work with lists efficiently and expressively, using key concepts like list comprehensions, spines, and folds. Understanding these tools will allow you to manipulate collections with a level of precision and flexibility that procedural approaches simply cannot match.

2. List Comprehensions: A Declarative Approach to Lists

In Haskell, list comprehensions provide a clean and concise way to generate and transform lists. If you’ve worked with list comprehensions in Python, the idea will be familiar, but Haskell’s version is more powerful due to the type system and the declarative nature of the language.

List comprehensions allow you to express complex operations on lists in a readable, mathematical form. You can think of them as a way to "declare" what you want the list to look like, rather than "instructing" the computer step by step on how to build it.

Syntax of List Comprehensions

At its core, a list comprehension looks like this:

[expression | element <- list, condition]

expression: Defines how each element in the resulting list will be constructed.
element <- list: Defines where the elements are drawn from.
condition: A filter that specifies which elements should be included in the resulting list.

Example 1: Simple List Comprehension

Let’s start with a basic example: squaring all the numbers in a list.

squares :: [Int] -> [Int]
squares xs = [x * x | x <- xs

In this example:

The expression x * x tells Haskell to square each element x from the list xs.
The result is a new list where each element is the square of the corresponding element in the original list.

Example 2: Adding a Condition

Now, let’s say we only want the squares of the even numbers in a list.

evenSquares :: [Int] -> [Int]
evenSquares xs = [x * x | x <- xs, even x]

Here, we’ve added a condition (even x) to filter the list before squaring the numbers. Only the elements that satisfy the condition (i.e., are even) are passed into the expression for squaring.

Real-World Example: Generating Cartesian Products

List comprehensions become even more powerful when working with multiple lists. For example, let’s generate all possible pairs from two lists—a common operation when working with Cartesian products.

cartesianProduct :: [a] -> [b] -> [(a, b)]
cartesianProduct xs ys = [(x, y) | x <- xs, y <- ys]

This list comprehension iterates over both lists and generates every possible pair (x, y) from the elements of xs and ys. You can imagine using this approach to generate test cases, grid coordinates, or combinations in larger applications.

3. Folds: Reducing Lists to a Single Value

In Haskell, folds are one of the most important ways to process and reduce lists. They allow you to take a list and "collapse" it into a single value, whether that’s a sum, a product, a concatenated string, or any other result.

At their core, folds apply a binary function (a function that takes two arguments) to a list and an accumulator, folding the list into a single result. Folds are extremely versatile and can be used to implement many common operations, like summing a list, finding the maximum element, or even recreating other list-processing functions like map or filter.

Types of Folds

There are two main types of folds in Haskell:

foldr: Fold from the right.
foldl: Fold from the left.

While they both serve the same purpose—reducing a list to a single value—their difference lies in the direction in which they process the list and how they interact with Haskell’s lazy evaluation model.

foldr: Folding from the Right

foldr stands for fold-right, and it works by processing the list from right to left. It takes three arguments:

A binary function (a function that takes two arguments).
A starting accumulator (the initial value to start folding with).
The list to be folded.

The folding process begins at the end of the list and moves toward the beginning, combining elements along the way.

Syntax of foldr:

foldr :: (a -> b -> b) -> b -> [a] -> b

This means that foldr takes:

A function of type (a -> b -> b)—the binary function to apply.
A starting value of type b.
A list of type [a] to be folded.

Let’s break this down with an example:

Example 1: Summing a List with foldr

sumList :: [Int] -> Int
sumList = foldr (+) 0

Here, we’re using foldr to sum a list of integers:

The function (+): (Int -> Int -> Int) is applied between each pair of elements.
0 is the starting value (the accumulator).
The list is processed from right to left.

The process looks something like this for the list [1, 2, 3]:

foldr (+) 0 [1, 2, 3]
-- becomes: 1 + (2 + (3 + 0))
-- result: 6

Notice how we start at the end (3 + 0), then move left, folding the result into each element along the way.

Example 2: Building a List Backwards

Another interesting use case for foldr is that you can use it to rebuild lists or other structures. For example, let’s use foldr to implement the map function:

mapWithFoldr :: (a -> b) -> [a] -> [b]
mapWithFoldr f = foldr (x xs -> f x : xs) []

Here, we use foldr to apply a function f to every element in the list, appending the result to the front of the list (using :) as we go.

foldl: Folding from the Left

foldl stands for fold-left, and it processes the list from left to right. Unlike foldr, which starts with the last element, foldl begins with the first element and works its way to the right.

Syntax of foldl:

foldl :: (b -> a -> b) -> b -> [a] -> b

The main difference here is that the binary function’s arguments are flipped—foldl first applies the function to the accumulator and then to the list element.

Example 1: Summing a List with foldl

sumListLeft :: [Int] -> Int
sumListLeft = foldl (+) 0

For the list [1, 2, 3], the process looks like this:

foldl (+) 0 [1, 2, 3]
-- becomes: ((0 + 1) + 2) + 3
-- result: 6

Why Choose foldr or foldl?

The main difference between foldr and foldl comes down to:

Direction: foldr processes from right to left, while foldl processes from left to right.
Lazy Evaluation: foldr works better with Haskell’s lazy evaluation model because it can handle infinite lists, whereas foldl requires the entire list to be evaluated before folding begins.

In most cases, you’ll prefer foldr due to its compatibility with laziness, but there are times when foldl is more efficient (especially with large lists and strict evaluation).

Real-World Examples of Folds

Now that we understand the mechanics of folds, let’s look at some real-world examples of how they can be applied in practice.

Example 1: Finding the Maximum Element

Let’s use foldr to find the maximum element in a list of integers.

maxElement :: [Int] -> Int
maxElement = foldr1 max

Here, foldr1 is a variant of foldr that doesn’t require an explicit starting value. It simply uses the first element of the list as the initial accumulator.

Example 2: Concatenating Strings

We can use foldr to concatenate a list of strings into a single string:

concatStrings :: [String] -> String
concatStrings = foldr (++) ""

This folds the list of strings using the concatenation operator (++) and an initial empty string "".

Example 3: Reversing a List

Finally, let’s use foldl to reverse a list:

reverseList :: [a] -> [a]
reverseList = foldl (acc x -> x : acc) []

This works by prepending each element to the accumulator as we move from left to right, effectively reversing the order of the list.

4. Spines: Understanding the Structure of Lists

One thing that makes lists in Haskell different from lists in many other languages is the concept of spines. The spine is the structure that holds the elements of a list together, and understanding how it works can help you write more efficient programs.

What is the Spine of a List?

A spine in Haskell represents the structure of the list—specifically, the "links" between elements. A list in Haskell is either:

Empty ([]), or
A cons cell (:) that contains a head element and a reference to the next element (which can be either another cons cell or an empty list).

You can think of the spine as the "skeleton" of the list. The elements are like "flesh" attached to the spine.

Example: Visualizing the Spine

Consider the list [1, 2, 3]. Its structure is:

1 : (2 : (3 : []))

Here, the spine is made up of the : (cons) operators linking the elements together, ending in the empty list [].

Why Do Spines Matter?

In Haskell, because of lazy evaluation, the elements of a list and the spine are evaluated separately. This means you can partially evaluate a list—perhaps you’ve calculated the spine but haven’t yet evaluated the elements. This is a powerful feature, but it can also introduce inefficiencies if you’re not careful.

For example, if you traverse a list just to calculate its length, you’ll traverse the entire spine, but you won’t need to evaluate the elements:

length :: [a] -> Int
length [] = 0
length (_:xs) = 1 + length xs

In this example, the length function only traverses the spine—it doesn’t care about the values in the list.

Understanding this distinction is key when you start working with large lists or performing partial evaluations. It helps you optimize your code by deciding when it makes sense to evaluate the spine or the elements of the list.

Efficient Use of Spines

When working with lists, understanding how Haskell evaluates the spine can help you write more efficient code. Since Haskell lists are linked lists (not arrays), operations that require traversing the spine will have a time complexity proportional to the length of the list. Some operations only need to touch the spine, while others need to evaluate both the spine and the elements.

Example: Efficient Operations

Certain operations are efficient because they only need to traverse the spine, without caring about the elements:

length: Traverses the spine to count the number of elements, but never evaluates the elements themselves.
null: Checks whether the spine is empty by looking at the first constructor ([] or :). It doesn’t care about the elements.
tail: Returns the rest of the list, which means discarding the first : in the spine. It doesn’t need to evaluate the elements.

These operations are efficient because they only operate on the structure, without needing to evaluate the values inside the list.

Potential Inefficiencies of Spines

While lazy evaluation and separate evaluation of spines and elements can make certain operations efficient, it can also lead to inefficiencies if not handled carefully.

1. Spine Traversal and Deep Structures

Because Haskell lists are linked lists, any operation that requires traversing the entire spine of the list will take O(n) time, where n is the length of the list. This means that some operations, like appending to the end of a list, are inherently inefficient because they require traversing the entire spine first.

For example, consider this operation:

appendToList :: [Int] -> [Int]
appendToList xs = xs ++ [4]

The operator (++) works by traversing the entire spine of the first list to find the end, where the second list is then appended. So, if xs is long, this operation becomes increasingly expensive, as it has to traverse the entire spine before adding 4.

2. Spine-Heavy Operations and Memory Leaks

Another potential inefficiency arises when you partially evaluate lists, leading to a phenomenon known as space leaks. In certain situations, the spine can be built up in memory without fully evaluating the elements, leading to unnecessary memory usage. This is particularly problematic in long-running programs that repeatedly construct large lists without evaluating them.

For example, take this code:

buildList :: Int -> [Int]
buildList n = [1..n] ++ [100]

If buildList is used in a context where the result is not immediately evaluated, Haskell might leave the spine unevaluated in memory, consuming space until it’s finally needed. This is called thunking — Haskell delays computation by storing "thunks" (deferred computations), which can accumulate and lead to memory bloat if left unchecked.

3. Repeated Spine Traversals

Repeatedly traversing the spine of a list can also lead to inefficiency. For example, calling length multiple times on the same list will result in multiple traversals of the spine:

list = [1..1000000]

main = do
    print $ length list   -- First traversal of the spine
    print $ length list   -- Second traversal of the spine

In this case, both length operations traverse the entire spine of the list separately, even though the result is the same. If you need to reuse results that involve traversing the spine, it’s better to cache the result (using a variable or memoization) to avoid repeated traversals.

Optimizing for Spine Efficiency

Understanding spines allows you to make informed decisions about how to work with lists efficiently in Haskell. Here are some ways to optimize performance when working with spines:

1. Use `foldr` Instead of `foldl`

When folding lists, prefer right folds (foldr) over left folds (foldl), especially when working with large or infinite lists. Since foldr works with lazy evaluation, it can stop processing once it reaches a base case, whereas foldl must traverse the entire spine first before returning a result.

For example:

sumListRight :: [Int] -> Int
sumListRight = foldr (+) 0   -- Efficient with laziness

In contrast, foldl must evaluate the entire spine, which can lead to inefficiency:

sumListLeft :: [Int] -> Int
sumListLeft = foldl (+) 0    -- Requires full spine evaluation

2. Avoid Appending with `(++)`

When building lists, avoid appending elements to the end using the (++) operator, as this requires traversing the entire spine. Instead, prefer prepending (:), which only requires modifying the first link in the spine.

-- Inefficient: Traverses the whole spine
appendEnd :: [Int] -> [Int]
appendEnd xs = xs ++ [4]

-- Efficient: Prepend to the start
prependStart :: [Int] -> [Int]
prependStart xs = 4 : xs

3. Force Evaluation When Necessary

In cases where laziness might lead to memory bloat (space leaks), you can use functions like seq or deepseq to force evaluation of the list’s elements and spine. This ensures that thunks don’t accumulate in memory.

import Control.DeepSeq

evaluateList :: [Int] -> Int
evaluateList xs = xs `deepseq` length xs

Here, deepseq forces the evaluation of both the spine and the elements, preventing memory leaks.

5. Recap & Exercises

Recap

Folds are foundational: They allow us to reduce lists into single values and are used for a wide range of operations, from summing numbers to building complex data structures.
Direction matters: foldr processes lists from right to left and is often more compatible with lazy evaluation, while foldl processes from left to right and can be more efficient with strict evaluation.
Flexibility: With folds, you can implement nearly any list-processing function, whether it’s summing, finding a maximum, reversing, or even mimicking higher-level functions like map and filter.

Exercises

Here are some exercises to help you practice working with list comprehensions, spines, and folds:

Exercise 1: List Comprehensions

Write a list comprehension that generates a list of all odd squares from 1 to 100.

oddSquares :: [Int]
-- Your implementation here

Exercise 2: Custom Fold

Using foldr, write a function that counts the number of odd numbers in a list of integers.

countOdd :: [Int] -> Int
-- Your implementation here

Exercise 3: Reverse a List

Using foldr, write a function reverses a list.

reverseFoldr :: [a] -> [a]
-- Your implementation here

Previous Chapter | Next Chapter