chenzhongpu
chenzhongpu

Reputation: 6871

What is the usage of vhadd_s8 in Neon intrinsics?

I think the behaviors of narrowing addition are quite strange. For example, int8x8_t vhadd_s8(int8x8_t a, int8x8_t b):

Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.

Can anyone explain its usage scenario? The following is an example in Rust:

    let a_v: Vec<i8> = vec![8; 8];
    let b_v: Vec<i8> = vec![1; 8];
    unsafe {
        let a = vld1_s8(a_v.as_ptr());
        let b = vld1_s8(b_v.as_ptr());
        let c = vhadd_s8(a, b);
        println!("{:?}", c);
    }

Shift 8 right one bit becomes 4, and Shift 1 right one bit becomes 0. So the result is all 4. In which scenario, users would expect such 4 as the result?

Upvotes: 0

Views: 53

Answers (1)

Nate Eldredge
Nate Eldredge

Reputation: 58673

Remember that a right shift of one bit is division by 2 (rounding toward minus infinity). So this adds the inputs and divides the result by 2 - that is, it averages the inputs (arithmetic mean).

Regarding your example: note that the instruction does the addition first and then shifts the result, rather than shifting first and then adding as in your example. So if the inputs are 8 and 1, they are first added to give 9, then shifted to give 4. Indeed, the mathematical average of 8 and 1 is 4.5, which then rounds down to 4.

This instruction actually does a little better than simply combining an 8-bit add and a shift instruction, in that it ensure the addition doesn't overflow (presumably by keeping a 9-bit result for the addition internally). Suppose the inputs were 100 and 110. This instruction would give the mathematically correct answer of 105. However, if you do an 8-bit add by itself, it will overflow, giving a result of -46; if you shift this, you get -23.

(If the shift was done first, as you had it in your example, it would avoid overflow, but also give less accurate results. For instance, suppose the inputs were 5 and 9. If we shifted first, we would get 2 and 4, whose sum is 6. But the mathematical average of 5 and 9 is actually 7, and this is what we get if we add first and then shift.)

Upvotes: 3

Related Questions