StackOverflow Questions for Tag: neon

AG1
AG1

Reputation: 6774

Pack high bit of every byte in ARM, for 64 bytes like AVX512 vpmovb2m?

Score: 3

Views: 257

Answers: 3

Read More
AG1
AG1

Reputation: 6774

__builtin_neon_vqtbl4q_v: what is difference between first arg (__a) and sixth arg (__c)

Score: -1

Views: 58

Answers: 1

Read More
fuz
fuz

Reputation: 93127

How do I cast a vector to a float64_t to check a SIMD compare for all-zero?

Score: 3

Views: 126

Answers: 1

Read More
curiouscupcake
curiouscupcake

Reputation: 1277

Accelerating matrix vector multiplication with ARM Neon Intrinsics on Raspberry Pi 4

Score: 1

Views: 2121

Answers: 3

Read More
fabian
fabian

Reputation: 1881

How to Load and Store data for the new AVX-VNNI and Arm Neon MMLA instructions efficiently?

Score: 1

Views: 104

Answers: 1

Read More
miluz
miluz

Reputation: 1423

Fastest way to test a 128 bit NEON register for a value of 0 using intrinsics?

Score: 7

Views: 3899

Answers: 6

Read More
user2092113
user2092113

Reputation: 103

ARM NEON vectorization failure

Score: 5

Views: 4113

Answers: 2

Read More
ironman
ironman

Reputation: 3

Accumulate vector using Neon and print to stdout (assembly)

Score: 0

Views: 86

Answers: 2

Read More
Thomas Lavergne
Thomas Lavergne

Reputation: 38

vfmlalq_low_f16 and vfmlalq_high_f16 not setting their first operand to the result

Score: 0

Views: 38

Answers: 1

Read More
namea hang
namea hang

Reputation: 11

Which execution ports can SIMD shuffles use for AVX2 and NEON?

Score: 1

Views: 97

Answers: 1

Read More
A23149577
A23149577

Reputation: 2155

Accessing half of a register in AArch64 advanced SIMD

Score: 1

Views: 1279

Answers: 3

Read More
Steve Fan
Steve Fan

Reputation: 3361

What is the fastest algorithm in SIMD to compare a pattern with a bit mask?

Score: 1

Views: 143

Answers: 1

Read More
Morteza
Morteza

Reputation: 109

How to further optimize matrix multiplication in llm.c project?

Score: 4

Views: 179

Answers: 1

Read More
Bruno Causse
Bruno Causse

Reputation: 11

Why ARM NEON intrinsics are not faster than plain C++ for finding legal Othello moves?

Score: 1

Views: 122

Answers: 2

Read More
user27680699
user27680699

Reputation: 51

How to exactly find the first matching zero in ARM using `shrn`, `fmov`, `rbit`, `clz`?

Score: 3

Views: 139

Answers: 1

Read More
leeyee
leeyee

Reputation: 265

Compile ARM Neon intrinsics on macos (M3 chipsets) using clang

Score: 0

Views: 406

Answers: 1

Read More
Mikhail T.
Mikhail T.

Reputation: 4017

Compiling assembly-code on ARMv7: CLang vs. GNU

Score: 1

Views: 97

Answers: 1

Read More
Zvi Vered
Zvi Vered

Reputation: 623

ARM Intrinsic: Insert complex zero after each complex float sample

Score: 0

Views: 44

Answers: 1

Read More
HaggarTheHorrible
HaggarTheHorrible

Reputation: 7423

ARM Cortex-A8: Whats the difference between VFP and NEON

Score: 52

Views: 40744

Answers: 5

Read More
Mono
Mono

Reputation: 81

Accessing NEON intrinsics from Go

Score: 3

Views: 96

Answers: 1

Read More
PreviousPage 1Next