Alex Lop.
Alex Lop.

Reputation: 6875

Does Clang misunderstand the 'const' pointer specifier?

In the code below I saw that clang fails to perform better optimisation without implicit restrict pointer specifier:

#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>

typedef struct {
    uint32_t        event_type;
    uintptr_t       param;
} event_t;

typedef struct
{
    event_t                     *queue;
    size_t                      size;
    uint16_t                    num_of_items;
    uint8_t                     rd_idx;
    uint8_t                     wr_idx;
} queue_t;

static bool queue_is_full(const queue_t *const queue_ptr)
{
    return queue_ptr->num_of_items == queue_ptr->size;
}

static size_t queue_get_size_mask(const queue_t *const queue_ptr)
{
    return queue_ptr->size - 1;
}

int queue_enqueue(queue_t *const queue_ptr, const event_t *const event_ptr)
{
    if(queue_is_full(queue_ptr))
    {
        return 1;
    }

    queue_ptr->queue[queue_ptr->wr_idx++] = *event_ptr;
    queue_ptr->num_of_items++;
    queue_ptr->wr_idx &= queue_get_size_mask(queue_ptr);

    return 0;
}

I compiled this code with clang version 11.0.0 (clang-1100.0.32.5)

clang -O2 -arch armv7m -S test.c -o test.s

In the disassembled file I saw that the generated code re-reads the memory:

_queue_enqueue:
        .cfi_startproc
@ %bb.0:
        ldrh    r2, [r0, #8]            ---> reads the queue_ptr->num_of_items
        ldr     r3, [r0, #4]            ---> reads the queue_ptr->size
        cmp     r3, r2
        itt     eq
        moveq   r0, #1
        bxeq    lr
        ldrb    r2, [r0, #11]           ---> reads the queue_ptr->wr_idx
        adds    r3, r2, #1
        strb    r3, [r0, #11]           ---> stores the queue_ptr->wr_idx + 1
        ldr.w   r12, [r1]
        ldr     r3, [r0]
        ldr     r1, [r1, #4]
        str.w   r12, [r3, r2, lsl #3]
        add.w   r2, r3, r2, lsl #3
        str     r1, [r2, #4]
        ldrh    r1, [r0, #8]            ---> !!! re-reads the queue_ptr->num_of_items
        adds    r1, #1
        strh    r1, [r0, #8]
        ldrb    r1, [r0, #4]            ---> !!! re-reads the queue_ptr->size (only the first byte)
        ldrb    r2, [r0, #11]           ---> !!! re-reads the queue_ptr->wr_idx
        subs    r1, #1
        ands    r1, r2
        strb    r1, [r0, #11]           ---> !!! stores the updated queue_ptr->wr_idx once again after applying the mask
        movs    r0, #0
        bx      lr
        .cfi_endproc
                                        @ -- End function

After adding the restrict keyword to the pointers, these unneeded re-reads just vanished:

int queue_enqueue(queue_t * restrict const queue_ptr, const event_t * restrict const event_ptr)

I know that in clang, by default strict aliasing is disabled. But in this case, event_ptr pointer is defined as const so its object's content cannot be modified by this pointer, thus it cannot affect the content to which queue_ptr points (assuming the case when the objects overlap in the memory), right?

So is this a compiler optimisation bug or there is indeed some weird case when the object pointed by queue_ptr can be affected by event_ptr assuming this declaration:

int queue_enqueue(queue_t *const queue_ptr, const event_t *const event_ptr)

By the way, I tried to compile the same code for x86 target and inspected similar optimisation issue.


The generated assembly with the restrict keyword, doesn't contain the re-reads:

_queue_enqueue:
        .cfi_startproc
@ %bb.0:
        ldr     r3, [r0, #4]
        ldrh    r2, [r0, #8]
        cmp     r3, r2
        itt     eq
        moveq   r0, #1
        bxeq    lr
        push    {r4, r6, r7, lr}
        .cfi_def_cfa_offset 16
        .cfi_offset lr, -4
        .cfi_offset r7, -8
        .cfi_offset r6, -12
        .cfi_offset r4, -16
        add     r7, sp, #8
        .cfi_def_cfa r7, 8
        ldr.w   r12, [r1]
        ldr.w   lr, [r1, #4]
        ldrb    r1, [r0, #11]
        ldr     r4, [r0]
        subs    r3, #1
        str.w   r12, [r4, r1, lsl #3]
        add.w   r4, r4, r1, lsl #3
        adds    r1, #1
        ands    r1, r3
        str.w   lr, [r4, #4]
        strb    r1, [r0, #11]
        adds    r1, r2, #1
        strh    r1, [r0, #8]
        movs    r0, #0
        pop     {r4, r6, r7, pc}
        .cfi_endproc
                                        @ -- End function

Addition:

After some discussion with Lundin in the comments to his answer, I got the impression that the re-reads could be caused because the compiler would assume that queue_ptr->queue might potentially point to *queue_ptr itself. So I changed the queue_t struct to contain array instead of the pointer:

typedef struct
{
    event_t                     queue[256]; // changed from pointer to array with max size
    size_t                      size;
    uint16_t                    num_of_items;
    uint8_t                     rd_idx;
    uint8_t                     wr_idx;
} queue_t;

However the re-reads remained as previously. I still can't understand what could make the compiler think that queue_t fields may be modified and thus require re-reads... The following declaration eliminates the re-reads:

int queue_enqueue(queue_t * restrict const queue_ptr, const event_t *const event_ptr)

But why queue_ptr has to be declared as restrict pointer to prevent the re-reads I don't understand (unless it is a compiler optimization "bug").

P.S.

I also couldn't find any link to file/report an issue on clang that doesn't cause the compiler to crash...

Upvotes: 0

Views: 763

Answers (4)

chill
chill

Reputation: 16888

[talking about the original program]

This is caused by deficiency in the TBAA metadata, generated by Clang.

If you emit LLVM IR with -S -emit-llvm you'll see (snipped for brevity):

...

  %9 = load i8, i8* %wr_idx, align 1, !tbaa !12
  %10 = trunc i32 %8 to i8
  %11 = add i8 %10, -1
  %conv4 = and i8 %11, %9
  store i8 %conv4, i8* %wr_idx, align 1, !tbaa !12
  br label %return

...

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"min_enum_size", i32 4}
!2 = !{!"clang version 10.0.0 (/home/chill/src/llvm-project 07da145039e1a6a688fb2ac2035b7c062cc9d47d)"}
!3 = !{!4, !9, i64 8}
!4 = !{!"queue", !5, i64 0, !8, i64 4, !9, i64 8, !6, i64 10, !6, i64 11}
!5 = !{!"any pointer", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = !{!"int", !6, i64 0}
!9 = !{!"short", !6, i64 0}
!10 = !{!4, !8, i64 4}
!11 = !{!4, !5, i64 0}
!12 = !{!4, !6, i64 11}

See the TBAA metadata !4: this is the type descriptor for queue_t (btw, I added names to structs, e.g. typedef struct queue ...) you may see empty string there). Each element in the description corresponds to struct fields and look at !5, which is the event_t *queue field: it's "any pointer"! At this point we've lost all the information about the actual type of the pointer, which tells me the compiler would assume writes through this pointer can modify any memory object.

That said, there's a new form for TBAA metadata, which is more precise (still has deficiencies, but for that later ...)

Compile the original program with -Xclang -new-struct-path-tbaa. My exact command line was (and I've included stddef.h instead of stdlib.h since that development build with no libc):

./bin/clang -I lib/clang/10.0.0/include/ -target armv7m-eabi -O2 -Xclang -new-struct-path-tbaa  -S queue.c

The resulting assembly is (again, some fluff snipped):

queue_enqueue:
    push    {r4, r5, r6, r7, lr}
    add r7, sp, #12
    str r11, [sp, #-4]!
    ldrh    r3, [r0, #8]
    ldr.w   r12, [r0, #4]
    cmp r12, r3
    bne .LBB0_2
    movs    r0, #1
    ldr r11, [sp], #4
    pop {r4, r5, r6, r7, pc}
.LBB0_2:
    ldrb    r2, [r0, #11]                   ; load `wr_idx`
    ldr.w   lr, [r0]                        ; load `queue` member
    ldrd    r6, r1, [r1]                    ; load data pointed to by `event_ptr`
    add.w   r5, lr, r2, lsl #3              ; compute address to store the event
    str r1, [r5, #4]                        ; store `param`
    adds    r1, r3, #1                      ; increment `num_of_items`
    adds    r4, r2, #1                      ; increment `wr_idx`
    str.w   r6, [lr, r2, lsl #3]            ; store `event_type`
    strh    r1, [r0, #8]                    ; store new value for `num_of_items`
    sub.w   r1, r12, #1                     ; compute size mask
    ands    r1, r4                          ; bitwise and size mask with `wr_idx`
    strb    r1, [r0, #11]                   ; store new value for `wr_idx`
    movs    r0, #0
    ldr r11, [sp], #4
    pop {r4, r5, r6, r7, pc}

Looks good, isn't it! :D

I mentioned earlier there are deficiencies with the "new struct path", but for that: on the mailing list.

PS. I'm afraid there's no general lesson to be learned in this case. In principle, the more information one is able to give the compiler - the better: things like restrict, strong typing (not gratuitous casts, type punning, etc), relevant function and variable attributes ... but not in this case, the original program already contained all the necessary information. It is just a compiler deficiency and the best way to tackle those is to raise awareness: ask on the mailing list and/or file bug reports.

Upvotes: 2

Myst
Myst

Reputation: 19221

As you know, your code appears to modify the data, invalidating the const state:

queue_ptr->num_of_items++; // this stores data in the memory

Without the restrict keyword, the compiler must assume that the two types might share the same memory space.

This is required in the updated example because event_t is a member of queue_t and strict aliasing applies on:

... an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or...

On the original example there are a number of reasons the types might be considered to alias, leading to the same result (i.e., the use of a char pointer and the fact that types might be considered compatible "enough" on some architectures if not all).

Hence, the compiler is required to reload the memory after it was mutated to avoid possible conflicts.

The const keyword doesn't really enter into this, because a mutation happens through a pointer that might point to the same memory address.

(EDIT) For your convenience, here's the full rule regarding access to a variable:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types (88):

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

(88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

P.S.

The _t suffix is reserved by POSIX. You might consider using a different suffix.

It's common practice to use _s for structs and _u for unions.

Upvotes: 1

Lundin
Lundin

Reputation: 213832

The event_t member of queue_ptr could point at the same memory as event_ptr. Compilers tend to produce less efficient code when they can't rule out that two pointers point at the same memory. So there's nothing strange with restrict leading to better code.

Const qualifiers don't really matter, because these were added by the function and the original type could be modifiable elsewhere. In particular, the * const doesn't add anything because the pointer is already a local copy of the original pointer, so nobody including the caller cares if the function modifies that local copy or not.

"Strict aliasing" rather refers to the cases where the compiler can cut corners like when assuming that a uint16_t* can't point at a uint8_t* etc. But in your case you have two completely compatible types, one of them is merely wrapped in an outer struct.

Upvotes: 2

bolov
bolov

Reputation: 75707

As far as I can tell, yes, in your code queue_ptr pointee's contents cannot be modified. Is is it an optimization bug? It's a missed optimization opportunity, but I wouldn't call it a bug. It doesn't "misunderstand" const, it just doesn't have/doesn't do the necessary analyses to determine it cannot be modified for this specific scenario.

As a side note: queue_is_full(queue_ptr) can modify the contents of *queue_ptr even if it has const queue_t *const param because it can legally cast away the constness since the original object is not const. That being said, the definition of quueue_is_full is visible and available to the compiler so it can ascertain it indeed does not.

Upvotes: 1

Related Questions