Reputation: 488
I am working in an embedded Linux environment debugging a highly timing sensitive issue related to the pairing/binding of Zigbee devices.
Our architecture is such that data read from Zigbee Front End Module via SPI interface and then passed from Kernel space to user space for processing. The processed data and response is then passed back to kernel space and clocked out over the SPI interface again.
The Zigbee 802.15.4 timing requirements specifies that we need to respond within 19.5ms and we frequently have situations where we respond just outside of this window which results in a failure and packet loss on the network.
The Linux kernel is not running with pre-emption enabled and it may not be possible to enable preemption either.
My suspicion is that since the kernel is not preemptible there is another task/process which is using the ioctl() interface and this holds off the Zigbee application just long enough that the 19.5ms window is exceeded.
I have tried the following tools
Are there any other lightweight methods of profiling a system like this?
Is there anyway to catch when an ioctl call is pended on another task/thread? (assuming this is the root cause of the issue)
Upvotes: 2
Views: 310
Reputation: 2754
LTTng is the tool you are looking for. Like Oprofile, it profiles the entire system, but you will be able to see exactly what is going on with each process and kernel thread, in a timeline fashion. You will be able to view the interaction of the threads and scheduler around the point of interest, that is, when you miss your Zigbee deadline. You may have to get clever and use some method of triggering (or more likely, stopping) the LTTng trace once you've detected the missed packet, or you might get lucky and catch it right away just using the command line tools to start and stop tracing.
You may have to do some work to get there, for example you'll have to invest some time and energy in 1) enabling your kernel to run LTTng if it doesn't have it already, and 2) learning how to use it. It is a powerful tool, and useful for a variety of profiling and analysis tasks. Most commercial embedded Linux vendors have complete end-to-end LTTng products and configuration if you have that option. If not, you should be able to find plenty of useful help and examples on line. LTTng has been around for a very long time! Happy hunting!
Upvotes: 0
Reputation: 40649
Good question. Here's an idea. Don't think of it as profiling. Think of catching it in the act.
I would investigate creating a watchdog timer to go off after the 16.5ms interval. Whenever you are successful, reset the timer. That way, it will only go off when there's a failure. At that point, I would try to take a stack sample of the process, or possibly another process that might be blocking it.
That's an adaptation of this technique. It will take some work, but I'd be surprised if there's any tool that will tell you exactly what's going on, short of an in-circuit-emulator.
Upvotes: 1