Recently, I had a discussion with a buddy who writes a massive amount of Rust daily, about how return val works in C++. We started with the following:
#include <concepts>
#include <vector>
#include <stdexcept>
#include <utility>
template <typename T>
requires std::move_constructible<T>
class Container {
std::vector<T> storage;
size_t head = 0;
public:
template <typename U>
requires std::convertible_to<U, T>
void push(U&& val) { storage.push_back(std::forward<U>(val)); }
T pop() {
if (head >= storage.size()) throw std::runtime_error("container is empty!");
T val = std::move(storage[head]);
++head;
return val;
}
};
Let's ignore whether it's a good design or not. It is a bare-bones FIFO container. I have added the concepts so the discussion of copy construction is off the table, too. Now, the question is: How would val be returned within pop()?
Well, it turns out this is one of the tricky borderlines within C++. Even with -O0 on GCC 15.2 and Clang 21, pop() returns a move-constructed object, instead of the appearance of producing multiple copies. People tend to think defensively that = means a copy and in some cases, it does not. To further the investigation, you can run clang with the flags to get the assembly:
clang++ -S -O0 -std=c++23 -masm=intel -o output.asm input.cpp
Here we can find the right location where it is constructed:
......
.LBB0_7:
lea rdi, [rbp - 64]
lea rsi, [rbp - 40]
call Container<Foo>::pop()
...
.LBB0_9:
mov rdi, qword ptr [rbp - 72]
mov esi, dword ptr [rbp - 64]
call std::ostream::operator<<(int)@PLT
mov qword ptr [rbp - 80], rax
......
Container<Foo>::pop():
push rbp
mov rbp, rsp
sub rsp, 80
mov qword ptr [rbp - 64], rdi
mov rax, rdi
mov qword ptr [rbp - 56], rax
mov qword ptr [rbp - 8], rdi
mov qword ptr [rbp - 16], rsi
mov rdi, qword ptr [rbp - 16]
mov qword ptr [rbp - 48], rdi
mov rax, qword ptr [rdi + 24]
mov qword ptr [rbp - 40], rax
call std::vector<Foo, std::allocator<Foo>>::size() const
mov rcx, rax
mov rax, qword ptr [rbp - 40]
cmp rax, rcx
jb .LBB6_4
......
.LBB6_4:
mov rdi, qword ptr [rbp - 48]
mov byte ptr [rbp - 29], 0
mov rsi, qword ptr [rdi + 24]
call std::vector<Foo, std::allocator<Foo>>::operator[](unsigned long)
mov rdi, qword ptr [rbp - 64]
mov rsi, rax
call Foo::Foo(Foo&&) [base object constructor]
mov rax, qword ptr [rbp - 48]
mov rcx, qword ptr [rax + 24]
add rcx, 1
mov qword ptr [rax + 24], rcx
mov byte ptr [rbp - 29], 1
test byte ptr [rbp - 29], 1
jne .LBB6_6
mov rdi, qword ptr [rbp - 64]
call Foo::~Foo() [base object destructor]
.LBB6_6:
mov rax, qword ptr [rbp - 56]
add rsp, 80
pop rbp
ret
The reason for setting byte ptr [rbp - 29] from 0 to 1 is a special cleanup process used within Clang's CodeGen, which is outside the scope of this blog post. The call Foo::Foo(Foo&&) hints that val is indeed move constructed.
The most important bit is where Foo is being constructed via mov rdi, qword ptr [rbp - 64]. This gets passed originally at:
lea rdi, [rbp - 64]
lea rsi, [rbp - 40]
call Container<Foo>::pop()
which is a place that the caller can directly use. val is constructed directly in the caller's slot. Although NRVO is not enforced by the C++ standard, it is a common optimisation that is performed, even with the optimisation flag disabled, which is counterintuitive to a lot of people. In this example of Clang, the semantic analyser reads that function and identifies return paths to attach the NRVO flag to the variable.
The subtlety is why my Rust buddy was confused. In Rust, there is no distinction between a copy or a move when constructing an object. In fact, if we are stricter in defining things, moves are too trivial in Rust, so such distinction is not necessary.
Our discussion deserves a post because I realise sometimes it is very hard to reason about facts. The difficulty of successfully defending an argument depends heavily on the backgrounds and emotions of the person you are talking to.
The practical fact is that if we assume the above T is move constructible, then in almost all cases, NRVO kicks in. I have spent a whole blog post demonstrating this. However, the experienced engineer I talked to was not convinced. Sticking with a particular paradigm, he was unfamiliar with the complexity of modern C++ compilers. The hard truth is that if engineers are unwilling to get to the bottom to seek the truth, then the discussion results in a dirty fight.
I started writing more, because writing is a process of arguing against myself, which is exponentially easier than against anyone else. It is so great that certain languages and even AI have given us so much confidence that our own code will not easily break. But should we really stop caring just because the complexity is stripped away from our screens? Can we humble ourselves?