Switching speed of a procssor

7/14/2023

By putting _assume(false) in the default case, we are basically telling the optimizer that we will (or at least should) never reach there, and so it'll avoid generating the additional conditional check and branch.

It eliminates that extra conditional check and branch for unhandled switch cases. What it does as far as I can tell after peeking at the generated disassembly is the same thing using computed gotos does. When I put _assume(false) in the default case of my switch statement for the MSVC branch of code, it actually sped things up just about as much as using computed gotos with an immediate 30-35% reduction on all benchmarks.

Yet after digging through the compiler docs, I found this gem. Unfortunately, I couldn't use GCC and computed gotos on Windows since our build system including all of our build servers use MSVC there. On MSVC, using _assume(false) or _assume(0) as documented here. Computed gotos can reduce that down to just one conditional jump. So actually in the best-case scenario, the raw switch statement doesn't produce just one conditional jump/branch statement, but two. That conservative check applies even if you use a strongly-typed enum class and have a case for every possible enumerated constant with all the compilers I tested against. The reason this helped so much even though the optimizers I was using were producing perfect one-to-one jump tables is that the optimizers were still being conservative here with respect to checking if there are unhandled cases in the switch. That was an immediate win and I immediately got about a 30-35% reduction times on OSX and Linux (where we use GCC) on all benchmarks.

Using computed gotos on GCC as described here to replace the switch. Yet here's the list of things I found most beneficial: Mostly I just wrote some benchmarks, fired up profilers on various platforms, and tried a bunch of things while reading papers and articles on interpreter designs (some efforts succeeding to improve times, some failing) and peeking at the disassembly. I can't explain perfectly how and why they all worked to improve efficiency since I'm neither a compiler design nor a computer architecture wizard. I recently worked on this problem for a bytecode interpreter for a proprietary scripting language we use and managed some very nice improvements using a combo of techniques (unfortunately compiler-specific). So any thoughts? Is switch statement the best choice here and what other optimization can I do to make sure the simulation runs fluently? I'm just not sure if writing each one out separately would be much more efficient though even if it makes the code itself larger.Īnother idea, which I'm not sure how to do yet, is that I could try to detect the addressing mode and operation from the opcode before the switch statement(s) and then use one to run the addressing modes to get the effective address and then a second switch statement to perform the actual operation (and write back to memory if it was RMW instruction). For compact code these probably could be arranged into functions and placed in the cases based on which ones are needed. The opcodes can be broken into two parts: Addressing mode and the actual operation. Are there any better ways to make fast code than using a switch statement for the same purpose? While it may not be that big, the switch statement needs to be executed many times in a short period of time. As a practice I'm working on a CPU simulator (runs at about 1.78MHz) and I'm using a switch statement to execute correct opcodes based on the value in the IR (instruction register) variable.

0 Comments

Switching speed of a procssor

Leave a Reply.

Author

Archives

Categories