pranathi
pranathi

Reputation: 393

Generating nanosecond delay in C on STM32

I am using STM32F2 controller and I am interfacing with an ST7036 LCD display via 8 bit parallel interface.

The datasheet says there should be a 20 nano second delay between address hold and setup time.

How do I generate a 20 nanosecond delay in C?

Upvotes: 12

Views: 17888

Answers (4)

Heath Raftery
Heath Raftery

Reputation: 4149

There's lots of misleading info out there and a ready solution is not too hard to come by. Let me summarise with an answer that suits any modern STM32.

First do some one time initialisation ahead of time:

//For performance reasons, nanosecond delays are pre-calculated. To avoid pre-calculating kB of
//data, only the following quantities will be calculated. nsMAX is not a quantity and only used
//to find the end of the enum. It must be the last entry.
typedef enum DELAY_nanoseconds
{ ns5, ns10, ns20, ns30, ns50, ns100, ns200, ns300, ns500, ns1000, ns2000, ns3000, ns5000, ns10000, nsMAX }
eDELAY_nanoseconds;

//This must be manually kept in sync with eDELAY_nanoseconds.
#define DELAY_NANOSECONDS \
{   5,   10,   20,   30,   50,   100,   200,   300,   500,   1000,   2000,   3000,   5000,   10000 }

const uint32_t NANOSECOND_TUNE = 50;
const uint32_t NANOSECOND_VALS[] = DELAY_NANOSECONDS;

void DELAY_init(void)
{
  //== Prep for DELAY_nanoseconds() ==

  //Generate LUT. Eg. for sysclk = 180MHz, will produce 0 cycles for 0 to 5 ns, and then 1 cycle for 6 ns.
  //For sysclk = 4MHz, will produce 0 cycles up to 249ns, then 1 cycle for 250ns.
  for(eDELAY_nanoseconds i = 0; i < nsMAX; i++)
  {
    //Divide numerator and denominator by 1000 first before multiplying, to avoid numerator overflow.
    gCycsPerNanoseconds[i] = NANOSECOND_VALS[i] * (SystemCoreClock/1000) / (1000000000/1000);
    gCycsPerNanoseconds[i] -= MIN(gCycsPerNanoseconds[i], NANOSECOND_TUNE); //adjust for function overhead
  }

  //Could not find a good reference for this, so pieced together from multiple sources.
  SET_BIT(DCB->DEMCR, DCB_DEMCR_TRCENA_Msk); //Enable the DWT.
  SET_BIT(DWT->CTRL, DWT_CTRL_CYCCNTENA_Msk); //Enable the CYCCNT function of the DWT.
}

This pre-calculates a look up table to avoid divisions when doing the actual delay. The NANOSECOND_TUNE value can be determined empirically - it's easy to be way off at this scale, so this is a simple parameterised way to get some useful accuracy.

It also enables the DWT to give us a cycle-by-cycle counter. This is not well documented in one place, but the simple initialisation here seems fairly reliable. Note the DWT is protected on some platforms, so there may be an unlock code to set here too.

Now the function proper:

void inline __attribute__ ((always_inline)) DELAY_nanoseconds(eDELAY_nanoseconds ns)
{
  //Sacrifice some time up front setting this, to make each loop faster.
  DWT->CYCCNT = 0;

  while(DWT->CYCCNT <= gCycsPerNanoseconds[ns]) //Note <= because gCycsPerNanoseconds is floored.
    ;
}

Nothing to it. To keep the while loop tight, the CYCCNT value is zeroed first. Then we just need to check if we've got to our pre-calculated sentinel value.

See also here for the microsecond equivalent.

Upvotes: 0

bunkerdive
bunkerdive

Reputation: 2101

Use stopwatch_delay(4) below to accomplish approximately 24ns of delay. It uses the STM32's DWT_CYCCNT register, which is specifically designed to count actual clock ticks, located at address 0xE0001004.

To verify the delay accuracy (see main), you can call STOPWATCH_START, run stopwatch_delay(ticks), then call STOPWATCH_STOP and verify with CalcNanosecondsFromStopwatch(m_nStart, m_nStop). Adjust ticks as needed.

uint32_t m_nStart;               //DEBUG Stopwatch start cycle counter value
uint32_t m_nStop;                //DEBUG Stopwatch stop cycle counter value

#define DEMCR_TRCENA    0x01000000

/* Core Debug registers */
#define DEMCR           (*((volatile uint32_t *)0xE000EDFC))
#define DWT_CTRL        (*(volatile uint32_t *)0xe0001000)
#define CYCCNTENA       (1<<0)
#define DWT_CYCCNT      ((volatile uint32_t *)0xE0001004)
#define CPU_CYCLES      *DWT_CYCCNT
#define CLK_SPEED         168000000 // EXAMPLE for CortexM4, EDIT as needed

#define STOPWATCH_START { m_nStart = CPU_CYCLES;}
#define STOPWATCH_STOP  { m_nStop = CPU_CYCLES;}


static inline void stopwatch_reset(void)
{
    /* Enable DWT */
    DEMCR |= DEMCR_TRCENA; 
    *DWT_CYCCNT = 0;             
    /* Enable CPU cycle counter */
    DWT_CTRL |= CYCCNTENA;
}

static inline void stopwatch_delay(uint32_t ticks)
{
    uint32_t end_ticks = ticks + CPU_CYCLES;
    while(1)
    {
            if (CPU_CYCLES >= end_ticks)
                    break;
    }
}

// WARNING: ONLY VALID FOR <25ms measurements due to scaling by 1000!
uint32_t CalcNanosecondsFromStopwatch(uint32_t nStart, uint32_t nStop)
{
    uint32_t nDiffTicks;
    uint32_t nSystemCoreTicksPerMicrosec;
    
    // Convert (clk speed per sec) to (clk speed per microsec)
    nSystemCoreTicksPerMicrosec = CLK_SPEED / 1000000;
    
    // Elapsed ticks
    nDiffTicks = nStop - nStart;
    
    // Elapsed nanosec = 1000 * (ticks-elapsed / clock-ticks in a microsec)
    return 1000 * nDiffTicks / nSystemCoreTicksPerMicrosec;
} 

void main(void)
{
    int timeDiff = 0;
    stopwatch_reset();
    
    // =============================================
    // Example: use a delay, and measure how long it took
    STOPWATCH_START;
    stopwatch_delay(168000); // 168k ticks is 1ms for 168MHz core
    STOPWATCH_STOP;
    
    timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
    printf("My delay measured to be %d nanoseconds\n", timeDiff);
    
    // =============================================
    // Example: measure function duration in nanosec
    STOPWATCH_START;
    // run_my_function() => do something here
    STOPWATCH_STOP;
    
    timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
    printf("My function took %d nanoseconds\n", timeDiff);
}

Upvotes: 16

asund
asund

Reputation: 11

You should look into the FSMC peripheral available in your chip. While the configuration might be complicated, especially if you're not dropping in a memory part that it was designed for, you might find that your parallel interfaced device maps pretty well to one of the memory interface modes.

These sorts of external memory controllers must have a bunch of configurable timing options to support the range of different memory chips out there so you'll be able to guarantee the timings required by your datasheet.

The nice benefit of being able to do this is your LCD will then seem like any old memory mapped peripheral, abstracting away the lower level interfacing details.

Upvotes: 1

Aki Suihkonen
Aki Suihkonen

Reputation: 20017

The first specification I found of Stm32f2 assumes a clock frequency of 120 MHz. That's about 8ns per clock cycle. You would need about three single cycle instructions between successive write or read/write operations. In C, a++; will probably do (if a is located in stack).

Upvotes: 9

Related Questions