Mark Adler
Mark Adler

Reputation: 112239

How to detect crc32 on aarch64

Is it possible for a user program on aarch64 detect whether crc32 instructions are available? I have found references to kernel support for such detection, implying that the registers with the information about what instructions will work in user mode are not available in user mode (!).

Is that the case? Or is there a portable way to determine if the crc32 instructions are available?

Note: What I mean by "user program" and "portable" is an approach that does not require privileged instructions nor operating-system-specific calls or files (e.g. /proc/cpuinfo). The code itself needs to be able to detect if the instructions are available and use them if they are, or fall back to an alternative if they are not. As an example, Intel processors have the cpuid instruction for this purpose.

Update:

Poking around in ARM architecture descriptions, I found a user-level register, PMCR_EL0, which provides an 8-bit implementer code and an 8-bit ID code for the processor. Perhaps if I could find a list of those codes, I might be closer to what I'm looking for.

Update 2:

However, when I try to read that register, I get an illegal instruction exception. So even EL0 registers require privileged access?

Upvotes: 5

Views: 2509

Answers (3)

Frant
Frant

Reputation: 5895

Update : the original answer did not answer the question, since its author wanted some universal portion of code running at EL0 capable of determining if the CRC32 feature is present or not without any requirements on the operating system or bare-metal environment being used.

My understanding is that such a code would need to access ID_AA64ISAR0_EL1, and because code running at EL0 cannot access it, a switch to a more privileged exception level would be required anyway.

In the same way, trapping an illegal instruction using a 'portable' section of code would required accessing a VBAR_ELx register, which cannot be achieved from a program running at EL0 that would not rely on any underlying operating system/privileged monitor.

Therefore, my answer to question "Is that the case?" would be: Yes, it is, that is a portable/universal section of code running at EL0 cannot determine if the CRC32 feature is available or not.

This being said, the example code provided in the documentation referenced in the question is working fine on an Expressobin running aarch64 linux 4.14.80, and should be preferred to using getauxval() for the very reasons explained in the kernel documentation.

Upvotes: 1

Adenilson Cavalcanti
Adenilson Cavalcanti

Reputation: 84

Not to the best of my knowledge.

The way I've implemented it in Chromium's zlib was using the available OS functionality: https://cs.chromium.org/chromium/src/third_party/zlib/arm_features.c?l=29

It is also relevant to mention that the crc32 instructions on ARMv8 are part of the crypto extensions that are optional on ARMv8 and mandatory on ARMv8-1. It also means that runtime feature detection is necessary, for further details, please check: https://cs.chromium.org/chromium/src/third_party/zlib/BUILD.gn?l=64

I would avoid reading directly from /proc/cpuinfo, as that may not be available in some contexts (as also depending on the Android flavor, it may be a false negative).

In Chromium, zlib will run both in a privileged context (i.e. part of the network code in the main browser process) as also in a sandboxed context (i.e. part of the RendererProcess in a tab). In the RendererProcess, reading from /proc/cpuinfo should fail.

A sledgehammer approach would be to install a signal handler and execute the instruction with inline asm, that would cause a fault if the instruction is not available (and could be captured by the handler). Not recommended, though.

The aforementioned example (https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.txt) worked in 1 ARM board I've tested (MachiatoBin) but failed in 2 others (rock64 and nanopi m4).

The approach implemented in Chromium works on all the boards (as also a few cellphones I've tested).

Another detail about getauxval: the correct flag will change if running on 32bits or 64bits. So in 64bits it would be HWCAP_CRC32, while in 32bits it would be HWCAP2_CRC32.

About the sledgehammer approach: Signals are prone to race conditions plus you would still rely on the use of OS specific APIs (i.e. to install the signal handler).

Finally, depending on the context, if a given task crashes (even if by design and isolated from the execution context) it will will trigger red flags.

This is a point (i.e. feature detection) where life is way easier on x86.

That being said, it may be an acceptable compromise to rely on the OS features. We have being shipping the linked code in Chromium since release M66 (current stable is M72), first landed almost one year ago with no ill reports.

One consideration on Android was that internally the NDK may implement android_getCpuFeatures() using a dlopen()/dlsym() and that can add around 500us to 1000us at first startup, which is why we cache the result of the CPU feature detection.

Another consideration for multithreaded apps (like Chromium) was the need for a thread barrier (i.e. pthread_once_t) to avoid race conditions while performing the CPU feature detection.

Upvotes: 4

Martin Zeitler
Martin Zeitler

Reputation: 76589

this might not be directly accessible; but ARM would provide specifications for each processor - therefore there is a chance to create a chart, which can be used to look up CPU features by the model name. /proc/cpuinfo is Linux specific; the Windows the equivalent would be WMI; OSX does not run on ARM (as far as I know). unless it would be a type 1 hyper-visor, which bypasses the operating system entirely, there has to be OS specific code (and the user can also disable VT).

Upvotes: 0

Related Questions