ARM64 floating point arithmetic
ARM64 floating point registers
Aarch64 has 32 floating point registers. Each is 128-bit under name Q0-Q31. If one needs less precision for example 32-bit, one can access the lowest 32-bit part of each register under a different name. For example,
Q0 | 128-bit |
D0 | the lowest 64-bit part of Q0 |
S0 | the lowest 32-bit part of Q0 |
H0 | the lowest 16-bit part of Q0 |
B0 | the lowest 8-bit part of Q0 |
Below is a graphical illustration,
Floating point registers preserved by the callee function
Certain registers need to stay the same at the exit as at the entry of a function/subroutine being called (callee).
The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be preserved; it is the responsibility of the caller to preserve larger values. Procedure Call Standard for Aarch64 6.1.2 SIMD and Floating-Point registers
Project Overview
In previous posts, the result in an interger is returned by the Linux exit()
call. But in this post the result is a pointing point value, so we change the game plan – we’ll use the C function printf()
to print it out.
We first write add.h
and main.c
:
add.h
:
float add(float a, float b);
main.c
:
#include "add.h"
#include <stdio.h>
int main() {
printf("%f\n", add(3.4, -2.7));
return 0;
}
what remains is to write an add.s
, assemble and link it with main.c
to produce the executable file.
The add
function in assembly
Recall that all constants are preceded by #
. #0x10
is the hexadecimal 10, equivalent to decimal 16. #12
is just the decimal 12.
The C float
type is typically 32-bit. In C, the interface of add()
was declared as
float add(float a, float b);
At the entry of the add
routine in the assembly, the input parameters a and b are in the registers s0, s1 of 32-bit width. The result will be in s0. They don’t have to be preserved by the callee.
Refer to the ARM64 function call post for the use of the stack.
add.s
:
.global add
.text
add:
sub sp, sp, #0x10 // set sp to sp - 16 bytes
str lr, [sp, #8] // store link register lr at sp + 8 bytes
str fp, [sp] // store frame pointer fp at sp
mov fp, sp // set frame pointer fp to sp
fadd s0, s0, s1 // floating-point add
// store result in s0
ldr fp, [sp] // restore frame pointer fp from sp
ldr lr, [sp, #8] // restore link register lr from sp + 8 bytes
add sp, sp, #0x10 // set sp to sp + 16 bytes
ret
Run the project
We assemble the add.s
into a binary file add.o
:
$ as -o add.o add.s
then compile the remaing C files, and link with add.o
to produce the final executable file main
:
$ gcc -o main main.c add.o
We run it:
$ ./main
0.700000
References
Arm Compiler armasm User Guide. On https://developer.arm.com, search for “armasm user guide”. In the result list, find the latest version of “Arm Compiler armasm User Guide”.
Aarch64 registers, https://developer.arm.com/documentation/102374/0102/Registers-in-AArch64—general-purpose-registers
Procedure Call Standard for Aarch64, https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
ARM64 load/save register instructions, /2025/07/14/arm64-ldr-and-str.html
ARM64 function calls, /2025/08/04/arm64-func.html.
Introduction to Aarch64 architecture, 8. The Stack, https://hrishim.github.io/llvl_prog1_book/stack.html.