卑以自牧。

What NRVO has taught me

3 minutes read Published: 2026-02-12

Recently, I had a discussion with a buddy who writes a massive amount of Rust daily, about how return val works in C++. We started with the following:

#include <concepts>
#include <vector>
#include <stdexcept>
#include <utility>

template <typename T>
    requires std::move_constructible<T>
class Container {
    std::vector<T> storage;
    size_t head = 0;

public:
    template <typename U> 
        requires std::convertible_to<U, T>
    void push(U&& val) { storage.push_back(std::forward<U>(val)); }

    T pop() {
        if (head >= storage.size()) throw std::runtime_error("container is empty!");
        T val = std::move(storage[head]);
        ++head;
        return val;
    }
};

Let's ignore whether it's a good design or not. It is a bare-bones FIFO container. I have added the concepts so the discussion of copy construction is off the table, too. Now, the question is: How would val be returned within pop()?

Well, it turns out this is one of the tricky borderlines within C++. Even with -O0 on GCC 15.2 and Clang 21, pop() returns a move-constructed object, instead of the appearance of producing multiple copies. People tend to think defensively that = means a copy and in some cases, it does not. To further the investigation, you can run clang with the flags to get the assembly:

clang++ -S -O0 -std=c++23 -masm=intel -o output.asm input.cpp

Here we can find the right location where it is constructed:

......
.LBB0_7:
        lea     rdi, [rbp - 64]
        lea     rsi, [rbp - 40]
        call    Container<Foo>::pop()
...
.LBB0_9:
        mov     rdi, qword ptr [rbp - 72]
        mov     esi, dword ptr [rbp - 64]
        call    std::ostream::operator<<(int)@PLT
        mov     qword ptr [rbp - 80], rax
......
Container<Foo>::pop():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 80
        mov     qword ptr [rbp - 64], rdi
        mov     rax, rdi
        mov     qword ptr [rbp - 56], rax
        mov     qword ptr [rbp - 8], rdi
        mov     qword ptr [rbp - 16], rsi
        mov     rdi, qword ptr [rbp - 16]
        mov     qword ptr [rbp - 48], rdi
        mov     rax, qword ptr [rdi + 24]
        mov     qword ptr [rbp - 40], rax
        call    std::vector<Foo, std::allocator<Foo>>::size() const
        mov     rcx, rax
        mov     rax, qword ptr [rbp - 40]
        cmp     rax, rcx
        jb      .LBB6_4
......
.LBB6_4:
        mov     rdi, qword ptr [rbp - 48]
        mov     byte ptr [rbp - 29], 0
        mov     rsi, qword ptr [rdi + 24]
        call    std::vector<Foo, std::allocator<Foo>>::operator[](unsigned long)
        mov     rdi, qword ptr [rbp - 64]
        mov     rsi, rax
        call    Foo::Foo(Foo&&) [base object constructor]
        mov     rax, qword ptr [rbp - 48]
        mov     rcx, qword ptr [rax + 24]
        add     rcx, 1
        mov     qword ptr [rax + 24], rcx
        mov     byte ptr [rbp - 29], 1
        test    byte ptr [rbp - 29], 1
        jne     .LBB6_6
        mov     rdi, qword ptr [rbp - 64]
        call    Foo::~Foo() [base object destructor]
.LBB6_6:
        mov     rax, qword ptr [rbp - 56]
        add     rsp, 80
        pop     rbp
        ret

The reason for setting byte ptr [rbp - 29] from 0 to 1 is a special cleanup process used within Clang's CodeGen, which is outside the scope of this blog post. The call Foo::Foo(Foo&&) hints that val is indeed move constructed.

The most important bit is where Foo is being constructed via mov rdi, qword ptr [rbp - 64]. This gets passed originally at:

        lea     rdi, [rbp - 64]
        lea     rsi, [rbp - 40]
        call    Container<Foo>::pop()

which is a place that the caller can directly use. val is constructed directly in the caller's slot. Although NRVO is not enforced by the C++ standard, it is a common optimisation that is performed, even with the optimisation flag disabled, which is counterintuitive to a lot of people. In this example of Clang, the semantic analyser reads that function and identifies return paths to attach the NRVO flag to the variable.

The subtlety is why my Rust buddy was confused. In Rust, there is no distinction between a copy or a move when constructing an object. In fact, if we are stricter in defining things, moves are too trivial in Rust, so such distinction is not necessary.

Our discussion deserves a post because I realise sometimes it is very hard to reason about facts. The difficulty of successfully defending an argument depends heavily on the backgrounds and emotions of the person you are talking to.

The practical fact is that if we assume the above T is move constructible, then in almost all cases, NRVO kicks in. I have spent a whole blog post demonstrating this. However, the experienced engineer I talked to was not convinced. Sticking with a particular paradigm, he was unfamiliar with the complexity of modern C++ compilers. The hard truth is that if engineers are unwilling to get to the bottom to seek the truth, then the discussion results in a dirty fight.

I started writing more, because writing is a process of arguing against myself, which is exponentially easier than against anyone else. It is so great that certain languages and even AI have given us so much confidence that our own code will not easily break. But should we really stop caring just because the complexity is stripped away from our screens? Can we humble ourselves?