JSJ
JSJ

Reputation: 121

Low interrupt latency via dedicated architectures and operating systems

This question may seem slightly vague, however I am researching upon how interrupt systems work and their latency times. I am trying to achieve an understanding of how architecture facilities such as FIQ in ARM help decrease latency times. How does this differ from using a operating system that does not have access or can not provide access to this facilities? For example - Windows RT is made for ARM etc, and this operating system is not able to be ported to other architectures.

Simply put - how is interrupt latency different in dedicated architectures that have dedicated operating systems as compared to operating systems that can be ported across many different architectures (Linux for example)?

Sorry for the rant - I'm pretty confused as you can probably tell.

Upvotes: 0

Views: 242

Answers (1)

SilverCode
SilverCode

Reputation: 209

I'll start with your Windows RT example, Windows RT is a port of Windows to the ARM architecture. It is not a 'dedicated operating system'. There are (probably) many OSes that only run on only 1 architecture, but that is more a function of can't be arsed to port them due to some reason.

What does 'port' really mean though?

Windows has a kernel (we'll call is NT here, doesn't matter) and that NT kernel has a bunch of concepts that need to be implemented. These concepts are things like timers, memory virtualisation, exceptions etc...

These concepts are implemented differently between architectures, so the port of the kernel and drivers (I will ignore the rest of the OS here, often that is a recompile only) will be a matter of using the available pieces of silicon to implement the required concepts. This implementation is a called 'port'.

Let's zoom in on interrupts (AKA exceptions) on an ARM that has FIQ and IRQ. In general an interrupt can occur asynchronously, by that I mean at any time. The CPU is generally busy doing something when an IRQ is asserted so that context (we'll call it UserContext1) needs to be stored before the CPU can use any resources in use by UserContext1. Generally this means storing registers on the stack before using them. On ARM when an IRQ occurs the CPU will switch to IRQ mode. Registers r13 and r14 have there own copy for IRQ mode, the rest will need to be saved if they are used - so that is what happens. Those stores to memory take some time. The IRQ is handled, UserContext1 is popped back off the stack then IRQ mode is exited.

So the latency in this case might be the time from IRQ assertion to the time the IRQ vector starts executing. That going to be some set number of clock cycles based upon what the CPU was doing when the IRQ happened. The latency before the IRQ handling can occur is the time from the IRQ assert to the time the CPU has finished storing the context. The latency before user mode code can execute depends on too much stuff in the OS/Kernel to explain here, but the minimum boils down to the time from the IRQ assertion to the return after restoring UserContext1 + the time for the OS context switch.

FIQ - If you are a hard as nails programmer you might only need to use 7 registers to completely handle your interrupt servicing. I mentioned that IRQ mode has its own copy of 2 registers, well FIQ mode has its own copy of 7 registers. Yup, that's 28 bytes of context that doesn't need to be pushed out into the stack (actually one of them is the link register so it's really 6 you have). That can remove the need to store UserContext1 then restore UserContext1. Thus the latency can be reduced by up to the length of time needed to do that save/restore.

None of this has much to do with the OS. The OS can choose to use or not use these features. The OS can choose to make guarantees regarding how long it will take to execute the OSes concept of an interrupt handler, or it may not. This is one of the basic concepts of an RTOS, the contract about how long before the handler will run. The OS is designed for some purpose (and that purpose may be 'general') - that target design goal will have a lot more affect on latency than haw many target the OS has been ported to.

Go have a read about something like freertos than buy some hardware and try it. Annotate the code to figure out the latencies you really want to look at. IT will likely be the best way to get your ehad around it.

(*Multi-CPU systems do it the same with but with some synchronization and barrier functions and a sprinkling of complexity)

Upvotes: 1

Related Questions