Rust ownership explained: merging linked lists

Share better Rust!

Contents

Understanding Rust's ownership model with singly linked lists
Breaking all of Rust's ownership rules
Merging linked lists with idiomatic Rust
Merging linked lists with recursive Rust
Discussion

"Angus", you say, "you promised us practical, better-than-production Rust. You're not about to walk through a LeetCode question, are you?".

Well... yes. But stay with me! I have a very good reason.

Coding challenges are more likely to advance your knowledge in Rust than in any other language – because Rust makes them harder than other languages.

Getting down low and hand-coding linked lists, trees and graphs sets up violent altercations with the borrow checker.

It's quite easy to hide a surface-level knowledge of Rust's ownership model if you focus on high-level packages like web frameworks. But if you're not 100% proficient with std::mem::swap, Rc, RefCell and their friends, coding challenges will weed you out.

If you aspire to be a strong systems programmer or make quality contributions to popular libraries, you need this knowledge. It should be automatic.

I have just the task to help you close those gaps using clean, idiomatic Rust.

Understanding Rust's ownership model with singly linked lists

Linked lists fascinate me. It takes a moment to appreciate why Rust's safety guarantees actually make this trivial, entry-level data structure hard to implement.

Aria Beingessner's Learning Rust With Entirely Too Many Linked Lists is one of my favourite Rust resources. If you haven't worked through it, please do – you'll leave with a fine appreciation for why the borrow checker seems to get in our way, and even less confidence in working with unsafe code. This is a good thing, since overconfidence is the enemy of unsafe.

When confronted with the following challenge, you know that Rust won't make it straightforward:

"Given two sorted, singly linked lists, merge them in sorted order, and return the sorted list."

A list is represented simply as a ListNode with an optional reference to the next node in the chain:

#[derive(PartialEq, Eq, Clone, Debug)]  pub struct ListNode {      pub val: i32,      pub next: Option<Box<ListNode>>,  }

And the signature of the merge function is as follows:

pub fn merge(      mut list1: Option<Box<ListNode>>,      mut list2: Option<Box<ListNode>>,  ) -> Option<Box<ListNode>>

Does this surprise you? It surprised me.

When working with safe representations of linked lists and trees, I always expect some flavor of Option<Rc<RefCell<Node>>>. Rc supports multiple ownership of child nodes, so you can point two heads to the same child, or access a child element without taking ownership away from its containing list. RefCell provides interior mutability, allowing us to change a node's next pointer through a shared reference.

However, Box<ListNode> means each node has a single owner. That makes it harder to do what I'd normally do in a language like Go or C++: compare node.val for the first two nodes of the input list, append the node with the lowest value to the output list, and walk down the remaining chains.

Breaking all of Rust's ownership rules

Here's a direct translation of that approach into "Rust":

pub fn merge(      mut list1: Option<Box<ListNode>>,      mut list2: Option<Box<ListNode>>,  ) -> Option<Box<ListNode>> {      let pre_head = Some(Box::new(ListNode { val: 0, next: None }));1
    let mut tail = pre_head;2
      while list1.is_some() && list2.is_some() {3
        if list1.unwrap().val < list2.unwrap().val {4
            tail.unwrap().next = list1;             list1 = list1.unwrap().next;5
        } else {              tail.unwrap().next = list2;            list2 = list2.unwrap().next;          };            tail = tail.unwrap().next;      }        tail.unwrap().next = if list1.is_some() { list1 } else { list2 };6
      pre_head.unwrap().next7
}

pre_head is declared as a dummy node that points to the real head of the output list 1. In Go, this trick allows us to avoid checking that the output list is non-nil before attempting to append new nodes – we know for sure that pre_head is non-nil, or Some in our case. At the end of the function, we return pre_head's next node 7 to get the actual head.

Will this technique carry over to Rust? I think we both know the answer to that, but let's keep the fantasy alive a little longer.

We want to keep track of the tail of the output so we can append each node directly to the end of the list. Starting from the head and traversing the whole list for each node we append would result in quadratic runtime. So I guess we just initialize tail as pre_head 2? Sure, seems legit. 🫠

While there are still nodes in both list1 and list2, we execute the main loop of the algorithm 3. If only one list has nodes left in it, we know that all of these nodes have values greater than any of the nodes we've already appended to the output. We just tack the remaining list onto the output tail 6!

The loop is quite simple. We walk both lists, updating tail as we go.

And this algorithm is correct! But the Rust is garbage.

error[E0382]: borrow of moved value: `list1`   --> src/merge_two_sorted_lists.rs:32:11    |24  |     mut list1: Option<Box<ListNode>>,    |     --------- move occurs because `list1` has type `Option<Box<ListNode>>`, which does not implement the `Copy` trait...32  |     while list1.is_some() && list2.is_some() {    |     ------^^^^^-----------------------------    |     |     |    |     |     value borrowed here after move    |     inside of this loop33  |         if list1.unwrap().val < list2.unwrap().val {    |                  -------- `list1` moved due to this method call, in previous iteration of loop34  |             tail.unwrap().next = list1;35  |             list1 = list1.unwrap().next;    |             ----- this reinitialization might get skipped    |note: `Option::<T>::unwrap` takes ownership of the receiver `self`, which moves `list1`# Many, many lines of compiler errors elidederror[E0382]: borrow of moved value: `list2`error[E0382]: use of moved value: `list1`error[E0382]: use of moved value: `list2`error[E0382]: use of moved value: `tail`error[E0382]: use of moved value: `pre_head`

The moment we assign pre_head to tail 2, it's over. We've given away ownership of pre_head, and have nothing to return at 7.

When we unwrap the heads of list1 and list2 to compare them 4, we doom our loop to failure too:

pub fn unwrap(self) -> T

Option::unwrap takes self by value. That means we take ownership of both input list heads, but only attempt to put one back 5.

And I say "attempt", because that's fubared too. If the node at the head of list1 contains the smallest value, we give ownership of list1 to tail.next (taking ownership of tail in the process). But to move the head of list1 along to its next node, we need access to list1... which we've just given away.

Merging linked lists with idiomatic Rust

Let's stop this nonsense. You understand the challenge that Rust's ownership model presents to this sort of algorithm. Let me walk you through the idiomatic way to do it.

The correct solution reminds us to be alert to problems that can be solved without reaching for heavy-duty tools like Rc and RefCell, which both have runtime overhead.

Overuse of these types is a telltale sign of an inexperienced Rust developer. Often, being smart with references and swaps achieves the desired result in a way that can be checked statically.

Behold:

pub fn merge(      mut list1: Option<Box<ListNode>>,      mut list2: Option<Box<ListNode>>,  ) -> Option<Box<ListNode>> {      let mut head = None;      let mut next_tail = &mut head;8
      while list1.is_some() && list2.is_some() {9
        let head1 = &mut list1;          let head2 = &mut list2;10
          let input_head = if head1.as_ref().unwrap().val < head2.as_ref().unwrap().val { 11
            head1          } else {              head2          };            std::mem::swap(input_head, next_tail);12
                        let next_tail_next = &mut next_tail.as_mut().unwrap().next;        std::mem::swap(input_head, next_tail_next);13
        next_tail = next_tail_next;14
    }        *next_tail = if list1.is_some() { list1 } else { list2 };15
      head  }

This is quite remarkable. It turns out that we can merge two linked lists without ever taking ownership of a node.

At 8, we initialize head as None. This isn't equivalent to pre_head. This is the spot in memory where the true head of our output list will live.

next_tail is a mutable reference to this Option::None. That is, it is a reference to the place in memory where the next node should go.

The loop condition is the same – Option::is_some() takes &self, so there's no move here 9. But this time we start each iteration by taking mutable references to the head of each list 10. We don't move ownership of the input heads, we just point to them.

We want to avoid owning a node at all costs, so to perform the comparison between values 11, we change head1 and head2 from &mut Option<Box<ListNode>> into Option<&Box<ListNode>> using Option::as_ref.

We unwrap the Option<&Box<ListNode>>s (which are guaranteed to be Some thanks to the loop condition), and compare their val fields through references.

The reference to the lesser of the two nodes is assigned to input_head, which represents the node that we want to append to our output list. At this point, input_head is either a mutable reference to the head of list1, or a mutable reference to the head of list2.

Now the fun part.

At 12, we swap the node that input_head references with the node that next_tail references. In other words, we put the node input_head points to into the space indicated by next_tail as "the place to put the next node".

Rust won't allow us to create dangling pointers – input_head must still point to something even after we put its referent into the place marked by next_tail. That's what swap is good for. We initialized next_tail as None, so we take that None, and stick it into the space referenced by input_head.

Imagine that list1 contained the smallest head on any given iteration of the loop. We've just swapped &mut list1 and next_tail. At this point, list1 is None. next_tail points to what used to be list1. head contains the merged list built up so far and the rest of the list formerly known as list1.

Not ideal. Let's fix it with another swap 13. The first argument to swap is input_head again, which is a reference to either list1 or list2. Recall that this now points to None. The second argument is a mutable reference to the unsorted tail of the list that we just swapped into next_tail: &mut next_tail.as_mut().unwrap().next. This is a reference to next_tail's tail, which we'll call next_tail_next.

Breaking it down, next_tail.as_mut() turns &mut Option<Box<ListNode>> into Option<&mut Box<ListNode>>, which we unwrap, giving us access to the next node of the current tail. Remember, this next node represents the new head of one of the input lists.

We can't move the next node through a mutable reference to its parent, of course. We just want a reference. That's why we take a &mut to it before passing next_tail_next into swap.

input_head was pointing to None (the previous value referenced by next_tail), so swapping causes next_tail_next to once again point to None.

next_tail_next was pointing to the new head of one of the input lists. This swap puts the new head back into the list it came from, since input_head is just a mutable reference to either list1 or list2, depending on which head had the smaller value.

next_tail is still Some though. If we continued the loop here, the value we just added to the output would start being swapped around in the next iteration.

Thanks to our swaps, we know that next_tail_next is None, and that this is the exact spot in memory where the next node should be placed. We can just assign next_tail_next to next_tail at the bottom of the loop 14.

By doing this, we also guarantee that next_tail will always point to None at the top of the loop, and that the final node in the output list will correctly point its next field to None.

Once one of the lists is exhausted, we just whack the remaining list onto the end of the output 15. It's safe to assign directly to *next_tail here, because we advanced next_tail at the end of the final loop iteration.

Return the head, and there you have it – an iterative, efficient solution that avoids working with owned input nodes.

Merging linked lists with recursive Rust

It's possible to solve this problem recursively, and more readably than the iterative approach.

pub fn merge(      mut list1: Option<Box<ListNode>>,      mut list2: Option<Box<ListNode>>,  ) -> Option<Box<ListNode>> {      match (list1, list2) {16
        (None, None) => None,17
        (Some(node), None) | (None, Some(node)) => Some(node),18
        (Some(mut node1), Some(mut node2)) => {19
            if node1.val < node2.val {                  node1.next = merge(node1.next, Some(node2));20
                Some(node1)              } else {                  node2.next = merge(Some(node1), node2.next);                Some(node2)              }          }      }  }

The trick here is to see that we take ownership of every node in the input lists on the way down, and give it to the output list on the way back up. There's no faffing around with borrows, because we never borrow anything.

At each recursion, we consider the heads of the two inputs lists as a tuple of nodes 16.

If both are None, then both lists are empty – we return None to take the final spot in the output list 17.

In practice, this only happens if both input lists are empty to begin with. Each iteration removes at most one node from one of the input lists. The second match arm catches the case where one list is empty and one isn't 18. In this case, we just return the matched node, and it will bring the remainder of its list with it.

The interesting case is when both heads are Some 19. We identify the node with the lowest value, and set its next field to the result of a recursive merge call, passing node.next as one input, and the entirety of the list with the larger head as the second.

Note that matching the heads of each list takes ownership. We give that ownership away when we call merge recursively 20, but the immediate reassignment to nodeN.next means that the complete node is ours to pass back up the call chain once the recursion returns.

Greater readability comes at the cost of the stack space required to hold the recursive calls (i.e. it has O(n) space complexity).

However, more expensive code that performs well enough for your use case is typically a better option than high-performance code that only half the team understands.

You're on How To Code It, so naturally you understand both, but keep this trade-off in mind when coding collaboratively.

No exercises this time – head on over to LeetCode or HackerRank and sit with the discomfort of your Rust not being quite as strong as you hoped. It's a journey we all have to take.

Share better Rust!