Post

Extending gem5 for Arm MTE – Part 1

Advertising Arm MTE Support in gem5

Extending gem5 for Arm MTE – Part 1

This is the first post in a series on extending gem5 to support Arm’s Memory Tagging Extension (MTE), and we are not just going to do the “copy these steps and it works” routine. We will also look at the reasoning behind each step. Sometimes that means diving deep into gem5’s internals, like its ISA description language, decode logic, and register model. Other times we will zoom out to think like a CPU architect, exploring how you would approach modeling a new feature from scratch.

By the end of the series, MTE will not just be running in your gem5 build. You will also know how to approach any architectural extension, understand where it fits in the simulator, and feel confident navigating gem5’s framework without getting lost.

This short post focuses only on: (1) advertising Arm MTE in gem5 by adding FEAT_MTE and FEAT_MTE2, and (2) running a quick Full-System experiment to see how the kernel reacts, and to uncover what’s still missing.

FEAT_* Codes in Arm

Arm evolves its ISA in modular chunks called features. Each feature has a short name, or mnemonic, such as FEAT_SVE for the Scalable Vector Extension, FEAT_AES for AES encryption instructions, or FEAT_MTE for the Memory Tagging Extension. Instead of making software guess which instructions a CPU understands (and risk a crash if it guesses wrong), the architecture provides feature registers that software can read to discover what is supported. In AArch64, one of the main places a CPU advertises these features is in the Processor Feature Registers.

We’re interested in FEAT_MTE and FEAT_MTE2 because they’re the architectural way of saying: “this CPU can do memory tagging.” In this post, we’ll wire up gem5 so it advertises MTE support.

FEAT_MTE vs. FEAT_MTE2

The “vanilla” FEAT_MTE (introduced in Armv8.5-A, MTE = 0b0001) is what Arm calls instruction-only MTE. It gives you the tag instructions (IRG, ADDG, STG, …), but doesn’t guarantee in-memory allocation tags or the full synchronous/asynchronous tag checking that Linux expects (we’ll get into the details of these things in future posts). Modern Linux kernels only advertise MTE in /proc/cpuinfo and set the HWCAP2_MTE flag if the MTE field is at least 0b0010, which means FEAT_MTE2 or newer. FEAT_MTE2 (Armv8.7-A) is basically MTE with the training wheels off: allocation tags are mandatory, synchronous checking is always there, and you get extra knobs for fault reporting and handling. So if our goal is to boot Linux in gem5 FS mode and see mte show up in /proc/cpuinfo, we’ll need to model (or at least advertise) FEAT_MTE2. That’s the point where the kernel says “yep, MTE is really here” and tells user space about it.

ID_AA64PFR1_EL1 Register

ID_AA64PFR1_EL1 is AArch64 Processor Feature Register 1. It is part of the Processor Feature Register (PFR) family, a set of read-only system registers that report the presence (and sometimes the version) of high-level architectural features to software. These registers are 64 bits wide and are divided into fixed-size bitfields, each assigned to a specific feature.

The PFR family currently includes ID_AA64PFR0_EL1 and ID_AA64PFR1_EL1. This “numbered register” approach is common in AArch64 and also exists for other categories, such as the Instruction Set Attribute Registers (ID_AA64ISARn_EL1), Memory Model Feature Registers (ID_AA64MMFRn_EL1), and Debug Feature Registers (ID_AA64DFRn_EL1). Each category focuses on a different area of the architecture.

For MTE specifically, the mte field in ID_AA64PFR1_EL1 occupies bits [11:8]. If this field is non-zero, software can assume that the core implements the basic MTE instruction set and state.

Tell gem5 that FEAT_MTE exists

We start by modifying src/arch/arm/ArmSystem.py. This file lives in gem5’s Python config layer for Arm, which is the middle-layer between the low-level C++/ISA code and the higher-level simulation scripts. While most of the simulator’s CPU and ISA behavior lives in C++ and the ISA Domain Specific Language (DSL), ArmSystem.py is where we describe, in Python, what an Arm-based system looks like to the rest of gem5. It’s where you’ll find things like:

  • The ArmSystem SimObject and its parameters (CPU type, memory ranges, etc.).
  • The list of CPU “extensions” that control which architectural features the simulated CPU claims to support.
  • Setup hooks for the memory map, GIC, and other platform components.

The important bit for us: this is where gem5 decides what features your CPU will say it supports. Adding FEAT_MTE and FEAT_MTE2 to the default extension list here is the first step in making your simulated CPU advertise MTE. Without it, the guest software will never “see” MTE, even if you later add all the MTE instructions and registers.

Add FEAT_MTE and FEAT_MTE2 to the ArmExtension enum. This enum is the list of default architectural features recognized by gem5’s Arm ISA model. Since MTE was introduced in Armv8.5 and is present in later revisions, we’ll add it under the group of features for Armv8.5 as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class ArmExtension(ScopedEnum):
    vals = [
	...
        # Armv8.5
        "FEAT_FLAGM2",
        "FEAT_RNG",
        "FEAT_RNG_TRAP",
        "FEAT_EVT",
        "FEAT_MTE",  # <--
        "FEAT_MTE2", # <-- (is OPTIONAL from Armv8.5)
        ...
    ]

class Armv85(Armv84):
    extensions = Armv84.extensions + [
        "FEAT_FLAGM2",
        "FEAT_RNG",
        "FEAT_RNG_TRAP",
        "FEAT_EVT",
        "FEAT_MTE",  # <--
        "FEAT_MTE2"  # <--
    ]

In gem5, the default feature set isn’t shared between simulation modes. Full-System (FS) mode uses ArmDefaultRelease, while Syscall Emulation (SE) mode uses ArmDefaultSERelease. In FS mode, the kernel runs at real exception levels, pokes at system registers like ID_AA64PFR1_EL1, and decides what features it can use. If FEAT_MTE isn’t in ArmDefaultRelease, the kernel simply won’t see it. SE mode is lighter weight, no kernel, just your program with gem5 faking the syscalls. But SE still has its own feature list (ArmDefaultSERelease), which it uses to fill in those same ID registers when user-space asks about them. If MTE’s not on that list, your SE test code will think it’s missing.

WE’ll put FEAT_MTE and FEAT_MTE2 in both places. That way your code sees MTE whether you’re running a tiny SE-mode test or a full OS in FS mode.

1
2
3
4
5
6
7
8
9
10
class ArmDefaultRelease(Armv8):
    extensions = Armv8.extensions + [
        ...
        # Armv8.5
        "FEAT_FLAGM2",
        "FEAT_EVT",
        "FEAT_MTE",   # <--
        "FEAT_MTE2",  # <--
	...
    ]

ArmDefaultSERelease lives in ArmISA.py, which sits closer to the CPU model. It defines ISA-specific SimObjects, parameters, and default extension sets for when you’re not simulating the full platform, exactly what SE mode does. No GIC, no firmware, just CPU, memory, and syscall emulation. Keeping it here lets the ISA model grab it without dragging in all the full-system baggage.

1
2
3
4
5
6
7
8
9
class ArmDefaultSERelease(ArmRelease):
    extensions = [
	...
        # Armv8.5
        "FEAT_FLAGM2",
        "FEAT_MTE",   # <--
        "FEAT_MTE2",  # <--
	...
    ]

Set the mte field in ID_AA64PFR1_EL1

According to the ARMv8.5 ISA spec, the mte field in ID_AA64PFR1_EL1 register (see this) tells you whether MTE is present on a system (and it’s version). In gem5, this register is implemented as one of the “misc” registers, which you’ll find in regs/misc.hh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
namespace ArmISA
{
    enum MiscRegIndex {
        ...
        // AArch64 registers (Op0=1,3)
        ...
        MISCREG_ID_AA64PFR0_EL1,
        MISCREG_ID_AA64PFR1_EL1,
        ...
    }
    const char * const miscRegName[] = {
        ...
        // AArch64 registers (Op0=1,3)
        ...
        "id_aa64pfr0_el1",
        "id_aa64pfr1_el1",
        ...
    }
}

The ID_AA64PFR1_EL1 system register and its fields are already listed in regs/misc_types.hh, but the field definitions aren’t fully filled in yet.

1
2
3
4
BitUnion64(AA64PFR1)
    Bitfield<27, 24> sme;
    Bitfield<19, 16> mpamFrac;
EndBitUnion(AA64PFR1)

Using the field descriptions from the Arm A-profile architecture documentation (see this) shown in the figure below, we can see that the register type (BitUnion64) and its fields should be defined in regs/misc_types.hh like this:

light mode only dark mode only Bit field layout of the ID_AA64PFR1_EL1 system register

We’ll add the missing bit fields to the register definition based on Arm spec.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
BitUnion64(AA64PFR1)
    Bitfield<63, 60> pear;
    Bitfield<59, 56> df2;
    Bitfield<55, 52> mtex;
    Bitfield<51, 48> the;
    Bitfield<47, 44> gcs;
    Bitfield<43, 40> mteFrac;
    Bitfield<39, 36> nmi;
    Bitfield<35, 32> csv2Frac;
    Bitfield<31, 28> rndrTrap;
    Bitfield<27, 24> sme;
    Bitfield<23, 20> res0;
    Bitfield<19, 16> mpamFrac;
    Bitfield<15, 12> rasFrac;
    Bitfield<11, 8> mte;  // <- this field is what we care about for now 
    Bitfield<7, 4> ssbs;
    Bitfield<3, 0> bt;
EndBitUnion(AA64PFR1)

Now that we’ve defined the fields for ID_AA64PFR1_EL1, the next stop is misc.cc to actually initialize them. This is where we set the mte field so our simulated CPU advertises FEAT_MTE.

We could also set other related fields like mtex and mteFrac to enable newer MTE versions (FEAT_MTE3, FEAT_MTE4) or async tagging (FEAT_MTE_ASYNC), but we’ll keep that for later so we don’t bite off too much at once.

In the snippet below, we initialize ID_AA64PFR1_EL1 based on the CPU’s extension list. If the list includes FEAT_MTE or FEAT_MTE2, we set the mte field to0x1 or 0x2, respectively; otherwise, it remains 0x0. The same pattern is used for other features like FEAT_SME and FEAT_MPAM. This way, the register’s reported features always match what we’ve configured in the CPU’s extension set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
InitReg(MISCREG_ID_AA64PFR1_EL1)
  .reset([release=release](){
      AA64PFR1 pfr1_el1 = 0;
      uint8_t mte_level = 0;
      if (release->has(ArmExtension::FEAT_MTE2)) {
          mte_level = 0x2;
      } else if (release->has(ArmExtension::FEAT_MTE)) {
          mte_level = 0x1;
      }
      pfr1_el1.mte = mte_level;
      pfr1_el1.sme = release->has(ArmExtension::FEAT_SME) ? 0x1 : 0x0;
      pfr1_el1.mpamFrac = release->has(ArmExtension::FEAT_MPAM) ?
          0x1 : 0x0;
      return pfr1_el1;
  }())
  .unserialize(0)
  .faultRead(EL0, faultIdst)
  .faultRead(EL1, faultHcrEL1<&HCR::tid3>)
  .allPrivileges().writes(0);

The figure below shows the path FEAT_MTE takes through gem5, from being enabled in your CPU config, to showing up in ID_AA64PFR1_EL1, and finally being visible to software.

light mode only dark mode only Flow of FEAT_MTE through gem5’s CPU model to software

Time to See It in Action (and Why It Won’t Work… Yet)

Alright, so we’ve told gem5 “Hey, this CPU has MTE!”. Naturally, the next thing you might want to do is boot Linux and check /proc/cpuinfo for that sweet mte flag. Here’s the catch: the moment the Linux kernel sees MTE in ID_AA64PFR1_EL1, it wastes no time putting it to work. Very early in boot, before you even get any outputs on the terminal, it tries to:

  • Seed the random tag generator (RGSR_EL1)
  • Configure the tag control policy (GCR_EL1)
  • Ask the hardware about supported block sizes (GMID_EL1)

In our current gem5 build, none of these registers exist yet. The CPU model just shrugs when Linux pokes them, and you get an unimplemented register trap. Since this happens long before the console is ready, the boot just… stops. No error message, no panic, no dmesg; just a silent hang.

Testing in FS Mode

Let’s confirm this issue when running the Linux kernel on gem5 FS mode. Even though we know this will fail, running it in FS mode is a good sanity check, and it’s the fastest way to confirm which registers Linux expects.

Build gem5

From the gem5 root directory, build the ARM target:

1
scons -j$(nproc) build/ARM/gem5.opt

Building gem5 can take quite a while — patience required :’(

Get the Kernel and Disk Image

We’ll use a ready-to-go Linux v5.10 kernel (link) and Ubuntu 18.04 disk image (link) from gem5’s public resources, but any kernel ≥ 5.10 should work.

1
2
3
4
5
6
7
8
9
mkdir -p fs_files
cd fs_files

# kernel
wget https://gem5dist.blob.core.windows.net/dist/develop/kernels/arm/static/arm64-vmlinux-5.10.110

# disk image
wget https://gem5dist.blob.core.windows.net/dist/develop/images/arm/ubuntu-18-04/arm64-ubuntu-20220425.img.gz
gunzip arm64-ubuntu-20220425.img.gz

Run the FS simulation

Run gem5 with the FS config script and our resources. We’ll run the simulation with debug flags enabled the to see the potential issues that may raise when accessing unimplemented registers.

1
2
3
4
5
6
7
8
9
10
11
build/ARM/gem5.opt \
    --debug-flags=Arm,MiscRegs,ExecAll \
    --debug-start=0 \
    --debug-end=100000000 \
    --debug-file=boot_debug.log \
    configs/example/arm/starter_fs.py \
    --cpu atomic \
    --num-cores 1 \
    --mem-size 1GB \
    --kernel fs_files/arm64-vmlinux-5.10.110 \
    --disk-image fs_files/arm64-ubuntu-20220425.img

As soon as gem5 starts the simulation, it prints something like:

txtsystem.terminal: Listening for connections on port 3456

You can connect to this port via telnet to watch the kernel boot in real time and then see the boot up process and later the actual terminal of the simulated kernel:

1
telnet localhost 3456

Spot the Problem

The boot will appear frozen. That’s your cue to kill the simulation and go hunting in the debug log.

If you look at the m5out/boot_debug.log file and search for the unimplemented keyword, you can find one unimplemented register that MSR instrcution tries to read from.

1
grep -i "unimplemented" m5out/boot_debug.log

You’ll see something like:

1002250: system.cpu_cluster.cpus: A0 T0 : 0x80793668 @kernel_init.__cpu_setup+112    : msr op0:3 op1:0 crn:1 crm:0 op2:6 (unimplemented) : No_OpClass :   flags=(IsNonSpeculative|IsInvalid)

If you look at this opcode (op0:3 op1:0 crn:1 crm:0 op2:6) in the ARM documentation (see this), you can see it encodes to GCR_EL1 (Tag Control Register). The kernel reads from GCR_EL1 in the very early __cpu_setup assembly code, immediately after detecting the feature. Missing it means the kernel can’t even finish CPU setup.

Where We Go from Here

Part 2 will be all about giving Linux just enough MTE to survive boot (or at least get past the first few hurdles). That means implementing the key system registers (GCR_EL1, RGSR_EL1, GMID_EL1, …) so the kernel’s setup code stops tripping over missing hardware. Once those are in place, the boot can finally move forward. Stay tuned!

This post is licensed under CC BY 4.0 by the author.