mkirche
mkirche

Reputation: 11

BitBanging a MHz protocol with the MicroPython Inline Assembler on PyBoard

I'm trying to use a PyBoard v1.1 to drive the protocol for Adafruit's NeoPixel LEDs SK6812RGBW using Micropython's inline assembler.

Protocol

As can be seen in the linked data sheet, a single LED is driven by assembling 4 8-bit rgbw values. Each high bit consists of 0.6 us of analog high, followed by 0.6 us of digital low, a low bit has the ratio of 0.3 us high to 0.9 us low. This makes every data bit used in the 4 byte LED colour value out to be 4 analog bits at 0.3 us each, or a total 128 bits over 38.4 us. The byte stream sent to the first LED contains the values for all subsequent LEDs as well, it passes all but its own on to the next, and so on.

Implementation

The protocol can be implemented quite easily using the pyboard's SPI interface. Once the data stream has been generated (effectively 16 bytes per LED) and the baudrate has been calculated (1s / 0.3 us = ca 3333333), one only needs to create a pyb.SPI instance and call its send method with the bytes as argument.

The Task

Now to the task at hand: I want to drive three different LED strips with one PyBoard. However there are only 2 SPI buses available. So after trying to bitbang the protocol with pyb.Pin and loops I quickly realized that wasn't going to work, the minimum toggle speed was 54 us which is just a bit shy of the 0.3 us I need...

Implementation V2

After trying some optimization steps I turned to Micropython's inline assembler. A few hours later I had managed to toggle a given pin at a breezy 23 ns as measured with an oscilloscope. That was great and all but I didn't need to mindlessly toggle pins, I needed to toggle pins according to a bit stream following an exact protocol. So a another couple of hours later I finished the following implementation:

@micropython.asm_thumb
def send_bits_on_x9(r0):
    # r0 0th word contains the data array address
    # r0 1st word contains length of data array

    # Store the GPIOB address in r3
    movwt(r3, stm.GPIOB)
    # Store the bit mask for PB6 (the pin X9)
    movw(r4, 1 << 6)

    # Load address into r5
    ldr(r5, [r0, 0])
    # Load array length into r6
    ldr(r6, [r0, 4])

    # Jump to condition evaluation
    b(loop_entry)

    # Main loop
    label(loop)

    # Get current "bit" word
    ldr(r0, [r5, 0])
    # Shift address to next "bit" for next time
    add(r5, r5, 4)

    # Evaluating the bit and toggling accordingly
    cmp(r0, 1)
    ite(eq)
    strh(r4, [r3, stm.GPIO_BSRRL])  # Turn LED on
    strh(r4, [r3, stm.GPIO_BSRRH])  # Turn LED off

    # Delay for a bit
    movwt(r7, 8)  # 20948000 cycles is about 1s
    label(delay)
    sub(r7, r7, 1)
    cmp(r7, 0)
    bgt(delay)

    # Eval loop; using data array length as initial counter value
    sub(r6, r6, 1)
    label(loop_entry)
    cmp(r6, 0)
    bgt(loop)

B6 is the CPU name for the pin X9 that I use as data connection to the LEDs.

To run I embedded it in a demo python script:

import array
import uctypes
import micropython
import stm

@micropython.asm_thumb
def send_bits_on_x9(r0):
    ...

send_buffer = array.array("i", [1, 0, 1, 1, 0, 0, 1, 0])
send_bits_on_x9(array.array("i", [uctypes.addressof(send_buffer), len(send_buffer)]))

The Problem

This worked beautifully, however when using it in place of the SPI streamer, looking at the LEDs occasional artifacts could be seen every couple of executions. The following is an image from when I looked at it with the oscilloscope: oscilloscope log with artifact As can be seen, there is a spot where for some reason it seems to stop toggling for exactly 2 value bits: missing flanks penciled in This happens seemingly at random, at any part of the bit stream, sometimes starting with a rising Flank, sometimes with a falling flank.

Question

Now obviously my question is why this would happen. It doesn't happen with SPI, though I assume that the C implementation takes care not to let anything interrupt the stream. I tried disabling the garbage collector before calling send_bits_on_x9 and re-enabling after, but that didn't help. I also changed the number of delay cycles that didn't change anything either.

A second thing I noticed was that when having a number of trailing zero-bytes (a 80 us reset period as per protocol defined), it seemed that that period would execute in about a quarter of the time it was supposed to. When changing the trailing bytes to 0xff, they retained their intended duration and the LEDs don't seem to mind.

Now if anyone could point me to a resource other that the official inline assembler documentation or even provide some insight, I'd appreciate it. Cheers!

Upvotes: 1

Views: 751

Answers (1)

mosi
mosi

Reputation: 1295

When doing time sensitive code on STM32 platform, one absolutely needs to read the User reference manual for that particular MCU. (Pyboard has an STM32F4 processor if I remember correctly).

Your problem is not python itself, as you disabled garbage collection correctly.

However, when running code for extensive lengths of time, interrupts WILL eventually kick in, interrupt the main() code, jump into the ISR function, execute what needs to be done, and return to the main() code.

Hence, your solution is elementary for inline assembler in micropython:

@micropython.asm_thumb
def f():
  # start
  cpsid('i') # set the Priority Mask Register - disable interrupts.
  ...
  ...
  cpsie('i') #clear the Priority Mask Register - enable interrupts.

Warning! Do not expect this to work with 1 second delay times in the middle of your function, ... the results might in some cases not be guaranteed.

Better solution:

  1. Define time critical code in @micropython.asm_thumb with interrupts turned OFF
  2. Define delays and non-critical code in python or separate assembler parts
  3. mix and match together as required

Upvotes: 0

Related Questions