StackOverflow Questions for Tag: micro-optimization

ABu
ABu

Reputation: 12299

Performance penalty: denormalized numbers versus branch mis-predictions

Score: 5

Views: 1920

Answers: 2

Read More
Poperton
Poperton

Reputation: 1746

How to auto-vectorize (SIMD) a modular multiplication in Rust

Score: 2

Views: 147

Answers: 1

Read More
BadUsernameIdea
BadUsernameIdea

Reputation: 230

Is it still worth using the Quake fast inverse square root algorithm nowadays on x86-64?

Score: 13

Views: 5904

Answers: 1

Read More
rpax
rpax

Reputation: 4496

DateTime.DayOfWeek micro optimization

Score: 23

Views: 13140

Answers: 1

Read More
Elliot Gorokhovsky
Elliot Gorokhovsky

Reputation: 3762

Generate FMOV without inline assembly

Score: 3

Views: 129

Answers: 2

Read More
user2346536
user2346536

Reputation: 1474

popcount in arm assembly without neon

Score: 1

Views: 929

Answers: 3

Read More
WilliamKF
WilliamKF

Reputation: 43179

Branch on ?: operator?

Score: 9

Views: 3698

Answers: 5

Read More
namea hang
namea hang

Reputation: 11

Which execution ports can SIMD shuffles use for AVX2 and NEON?

Score: 1

Views: 96

Answers: 1

Read More
Andreas
Andreas

Reputation: 1285

Micro Optimization of a 4-bucket histogram of a large array or list

Score: 3

Views: 1118

Answers: 4

Read More
George Robinson
George Robinson

Reputation: 2125

Bit packing of groups of n repeated bits in a 32-bit word, compact to 1 bit per group

Score: 24

Views: 2601

Answers: 6

Read More
Gilgamesz
Gilgamesz

Reputation: 5073

INC instruction vs ADD 1: Does it matter?

Score: 61

Views: 16785

Answers: 2

Read More
Hassan Syed
Hassan Syed

Reputation: 20496

boost::thread data structure sizes on the ridiculous side?

Score: 7

Views: 3281

Answers: 6

Read More
Hassan Syed
Hassan Syed

Reputation: 20496

Why is clang's `-O3` alloca 2x faster than g++ on a simplistic alloca benchmark

Score: 5

Views: 1280

Answers: 1

Read More
Ben
Ben

Reputation: 9723

Why can't GCC generate an optimal operator== for a struct of two int32s?

Score: 87

Views: 5107

Answers: 3

Read More
Victor Liu
Victor Liu

Reputation: 3653

Verifying compiler optimizations in gcc/g++ by analyzing assembly listings

Score: 16

Views: 7561

Answers: 8

Read More
Hassan Syed
Hassan Syed

Reputation: 20496

Preserving the Execution pipeline with branch layout in C source? Which prediction do CPUs or compilers start with?

Score: 1

Views: 160

Answers: 2

Read More
muser
muser

Reputation: 185

Why is this reordering of sub and mul instructions helpful?

Score: 3

Views: 141

Answers: 2

Read More
Thilo
Thilo

Reputation: 262794

Cost of exception handlers in Python

Score: 152

Views: 80871

Answers: 5

Read More
Noah
Noah

Reputation: 1759

Why does a NOP (as a 5th uop) speed up a 4 uop loop on Ice Lake?

Score: 13

Views: 499

Answers: 0

Read More
user541686
user541686

Reputation: 210735

Is it possible to make MSVC's __assume(0) aka std::unreachable() actually optimize?

Score: 3

Views: 385

Answers: 1

Read More
PreviousPage 1Next