This article accompanies Lesson 10 Function Calls and Lesson 11 Stack Operations of the ARM64 assembly tutorial of LaurieWired.

Refer to

The stack

When one function calls another one, the data in the old function is typically pushed into stack.

For example, suppose inside function f, prior to calling function g, the stack pointer register sp stores the value 0x1000, and in the memory, in the address 0x1000 is stored some value or nothing. We usually say sp is pointing at address 0x1000,

sp -> 0x1000     [empty or some value]

Inside function f, suppose it has two 64-bit registers of which it wishes to keep a copy of the values such that at the return from function g it still has access to them. Conceptually, it will decrement sp by 0x8 twice, and store the two 64-bit values at the new addresses, each occupying 8 bytes.

      0x1000    [empty or some value]
      0x0ff8    [value2 of f]
sp -> 0x0ff0    [value1 of f]

The function g can continue decrementing sp, and store more content at lower addresses than 0x0ff0 that sp points at. But when it’s about to return to f, the sp must have been incremented back up to 0x0ff0.

When it reenters f, conceptually, f can load the content sp points at into a previous register, increment sp by 0x8, and repeat doing the same, until the memory looks as

sp -> 0x1000     [empty or some value]

ARM64 stack is 16-byte aligned

ARM64’s stack is 16-bytes aligned, meaning the value stored in sp, interpreted as an address, must be multiples of 16, or 0x10.

So in the above example, practically we cannot do

sub sp, sp, #0x8
str x2, [sp]
sub sp, sp, #0x8
str x1, [sp]

because after the first decrementation, sp would be pointing at 0x0ff8, not a multiple of 0x10. Rather, we decrement sp by 0x10 in one go,

sub sp, sp, #0x10
str x2, [sp, #0x8]
str x1, [sp]

These lines can be combined into

stp x1, x2, [sp, #-0x10]!

Refer to Aarch64 addressing mode for explanation of the above line.

If there is only one 64-bit register to save, one still have to observe the 16-bytes alignment, effectively leaving some memory empty, for example,

sub sp, sp, #0x10
str x1, [sp]

It would be empty in addresses sp + 8 bytes through sp + 15 bytes.

Registers that must be preserved by the callee function

When an old function calls a new function, the called function (the callee) must preserve the content in registers x19-x29 and sp, such that just before ret from the callee, the content in x19-x29 and sp is the same as at the entry of it.

Function call with stack operation

func.s:

.global _start

_start:
    mov x0, #1
    mov x1, #2

    stp x0, x1, [sp, #-16]!      // store pair x0 x1
                                 // store x0 to sp - 16
                                 // store x1 to sp - 8
                                 // and set sp to sp - 16
    bl add_nums
    ldp x0, x1, [sp], #16        // load pair x0 x1
                                 // load x0 from sp
                                 // load x1 from sp + 8
                                 // and set sp to sp + 16

    mov x8, #0x5d
    svc #0

add_nums:
    add x0, x0, x1
    ret

x0 after the add_nums call would store 3, but after the ldp call it was reset to the old value 1.

$ as -o func.o func.s
$ gcc -o func func.o -nostdlib -static
$ ./func; echo $?
1

Generic prologue and clean-up code around function call

If one function calls another in nested way, the link register lr storing return address must be saved on the stack so that no return address is overriden, along with the frame pointer fp.

Aarch64 code walkthrough gave some generic prologue and clean-up code around function call for example. Recall that x19-x29 and sp must be preserved by the callee function.

Prologue:

stp     x19, x20, [sp,#-0x20]!     // x19-x29 must be preserved
str     x21, [sp,#0x10]
stp     fp, lr, [sp,#-0x10]!
mov     fp, sp

Clean-up:

ldp     fp, lr, [sp], #0x10
ldr     x21, [sp, #0x10]
ldp     x19, x20, [sp], #0x20
ret

In the above example, after entry into a function, before executing any other code of it, the stack would be pushed

[empty 64-bit]
x21
x20
x19
lr
fp                <--- the new stack pointer sp

and be unwound by the clean-up code.

References

ARM64 assembly tutorial, LaurieWired, https://www.youtube.com/playlist?list=PLn_It163He32Ujm-l_czgEBhbJjOUgFhg.

Introduction to Aarch64 architecture, 8. The Stack, https://hrishim.github.io/llvl_prog1_book/stack.html.

Aarch64 part 3: addressing mode, https://devblogs.microsoft.com/oldnewthing/20220728-00/?p=106912.

Aarch64 part 24: code walkthrough, https://devblogs.microsoft.com/oldnewthing/20220829-00/?p=107066

First ARM64 assembly program, /2025/07/13/first-arm64-code.html.

ARM64 load/store register instructions, /2025/07/14/arm64-ldr-and-str.html.

Arm Compiler armasm User Guide. On https://developer.arm.com, search for “armasm user guide”. In the result list, find the latest version of “Arm Compiler armasm User Guide”.