Jens
Jens

Reputation: 171

Efficient method for imposing (some cases of) periodic boundary conditions on floats?

Some cases of periodic boundary conditions (PBC) can be imposed very efficiently on integers by simply doing:

myWrappedWithinPeriodicBoundary = myUIntValue & mask

This works when the boundary is the half open range [0, upperBound), where the (exclusive) upperBound is 2^exp so that

mask = (1 << exp) - 1

For example:

let pbcUpperBoundExp = 2 // so the periodic boundary will be [0, 4)
let mask = (1 << pbcUpperBoundExp) - 1
for x in -7 ... 7 { print(x & mask, terminator: " ") }

(in Swift) will print:

1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

Question: Is there any (roughly similar) efficient method for imposing (some cases of) PBCs on floating point-numbers (32 or 64-bit IEEE-754)?

Upvotes: 2

Views: 566

Answers (2)

Jens
Jens

Reputation: 171

I'm adding this answer to my own question since it describes the, at the time of writing, best solution I have found. It's in Swift 4.1 (should be straight forward to translate into C) and it's been tested in various use cases:

extension BinaryFloatingPoint {
    /// Returns the value after restricting it to the periodic boundary
    /// condition [0, 1).
    /// See https://forums.swift.org/t/why-no-fraction-in-floatingpoint/10337
    @_transparent
    func wrappedToUnitRange() -> Self {
        let fract = self - self.rounded(.down)
        // Have to clamp to just below 1 because very small negative values
        // will otherwise return an out of range result of 1.0.
        // Turns out this:
        if fract >= 1.0 { return Self(1).nextDown } else { return fract }
        // is faster than this:
        //return min(fract, Self(1).nextDown)
    }
    @_transparent
    func wrapped(to range: Range<Self>) -> Self {
        let measure = range.upperBound - range.lowerBound
        let recipMeasure = Self(1) / measure
        let scaled = (self - range.lowerBound) * recipMeasure
        return scaled.wrappedToUnitRange() * measure + range.lowerBound
    }
    @_transparent
    func wrappedIteratively(to range: Range<Self>) -> Self {
        var v = self
        let measure = range.upperBound - range.lowerBound
        while v >= range.upperBound { v = v - measure }
        while v < range.lowerBound { v = v + measure }
        return v
    }
}

On my MacBook Pro with a 2 GHz Intel Core i7, a hundred million (probably inlined) calls to wrapped(to range:) on random (finite) Double values takes 0.6 seconds, which is about 166 million calls per second (not multi threaded). The range being statically known or not, or having bounds or measure that is a power of two etc, can make some difference but not as much as one could perhaps have thought.

wrappedToUnitRange() takes about 0.2 seconds, meaning 500 million calls per second on my system.

Given the right scenario, wrappedIteratively(to range:) is as fast as wrappedToUnitRange().

The timings have been made by comparing a baseline test (without wrapping some value, but still using it to compute eg a simple xor checksum) to the same test where a value is wrapped. The difference in time between these are the times I have given for the wrapping calls.

I have used Swift development toolchain 2018-02-21, compiling with -O -whole-module-optimization -static-stdlib -gnone. And care has been taken to make the tests relevant, ie preventing dead code removal, using true random input of different distributions etc. Writing the wrapping functions generically, like this extension on BinaryFloatingPoint, turned out to be optimized into equivalent code as if I had written separate specialized versions for eg Float and Double.

It would be interesting to see someone more skilled than me investigating this further (C or Swift or any other language doesn't matter).

EDIT: For anyone interested, here is some versions for simd float2:

extension float2 {
    @_transparent
    func wrappedInUnitRange() -> float2 {
        return simd.fract(self)
    }
    @_transparent
    func wrappedToMinusOneToOne() -> float2 {
        let scaled = (self + float2(1, 1)) * float2(0.5, 0.5)
        let scaledFract = scaled - floor(scaled)
        let wrapped = simd_muladd(scaledFract, float2(2, 2), float2(-1, -1))
        // Note that we have to make sure the result is not out of bounds, like
        // simd fract does:
        let oneNextDown = Float(bitPattern:
            0b0_01111110_11111111111111111111111)
        let oneNextDownFloat2 = float2(oneNextDown, oneNextDown)
        return simd.min(wrapped, oneNextDownFloat2)
    }
    @_transparent
    func wrapped(toLowerBound lowerBound: float2,
                 upperBound: float2) -> float2
    {
        let measure = upperBound - lowerBound
        let recipMeasure = simd_precise_recip(measure)
        let scaled = (self - lowerBound) * recipMeasure
        let scaledFract = scaled - floor(scaled)
        // Note that we have to make sure the result is not out of bounds, like
        // simd fract does:
        let wrapped = simd_muladd(scaledFract, measure, lowerBound)
        let maxX = upperBound.x.nextDown // For some reason, this won't be
        let maxY = upperBound.y.nextDown // optimized even when upperBound is
        // statically known, and there is no similar simd function available.
        let maxValue = float2(maxX, maxY)
        return simd.min(wrapped, maxValue)
    }
}

I asked some related simd-related questions here which might be of interest.

EDIT2:

As can be seen in the above Swift Forums thread:

// Note that tiny negative values like:
let x: Float = -1e-08
// May produce results outside the [0, 1) range:
let wrapped = x - floor(x)
print(wrapped < 1.0) // false
// which may result in out-of-bounds table accesses
// in common usage, so it's probably better to use:
let correctlyWrapped = simd_fract(x)
print(correctlyWrapped < 1.0) // true

I have since updated the code to account for this.

Upvotes: 0

Davis Herring
Davis Herring

Reputation: 40013

There are several reasonable approaches:

  1. fmod(x,1)
  2. modf(x,&dummy) — has the advantage of knowing its divisor statically, but in my testing comes from libc.so.6 even with -ffast-math
  3. x-floor(x) (suggested by Jens in a comment) — supports negative inputs directly
  4. Manual bit-twiddling direct implementation
  5. Manual bit-twiddling implementation of floor

The first two preserve the sign of their input; you can add 1 if it's negative.

The two bit manipulations are very similar: you identify which significand bits correspond to the integer portion, and mask them (for the direct implementation) or the rest (to implement floor) off. The direct implementation can be completed either with a floating-point division or with a shift to reassemble the double manually; the former is 28% faster even given hardware CLZ. The floor implementation can immediately reconstitute a double: floor never changes the exponent of its argument unless it returns 0. About 20 lines of C are required.

The following timing is with double and gcc -O3, with timing loops over representative inputs into which the operative code was inlined.

fmod: 41.8 ns
modf: 19.6 ns
floor: 10.6 ns

With -ffast-math:

fmod: 26.2 ns
modf: 30.0 ns
floor: 21.9 ns

Bit manipulation:

direct: 18.0 ns
floor: 20.6 ns

The manual implementations are competitive, but the floor technique is the best. Oddly, two of the three library functions perform better without -ffast-math: that is, as a PLT function call than as an inlined builtin function.

Upvotes: 1

Related Questions