xtensor equivalent of numpy a[a3] = 1

Question

Title says it - what is the xtensor equivalent of numpy's

# set all elements > 3 to 1
sometensor[sometensor > 3] = 1

?

It looks like xt::filter works:

xt::filter(sometensor, sometensor > 3) = 1

But it also looks like the numpy version is much faster. I've build xtensor with xsimd, but it doesn't seem to help in this case. Is there a better, more simd-ish way to do it?

EDIT

I found filtration, which indeed is faster (by about 3x), but still slower than numpy (by about 10x)...

SOLUTION (thx Tom!)

a = xt::where(a > 0.5, 1.0, a);

Is the fastest of all - about 10x faster than filtration, so it looks like it's simd-d!

Tom de Geus · Accepted Answer

xt::filter appears be a view, which are (currently) not super efficient in xtensor. I would use xt::where. It might result in a temporary though, which may of may not be the case in NumPy. Since I don't know the details on the the temporary let's do at least some timing:

1. NumPy indexing:

import numpy as np 
from datetime import datetime

a = np.random.random([1000000])
start = datetime.now()
a[a > 0.5] = 1.0
stop = datetime.now()
print((stop - start).microseconds)

On my system around 5000 microseconds.

2. NumPy where

import numpy as np 
from datetime import datetime

a = np.random.random([1000000])
start = datetime.now()
a = np.where(a > 0.5, 1.0, a)
stop = datetime.now()
print((stop - start).microseconds)

On my system about 2500 microseconds.

3. xtensor where

#include 
#include 
#include 

using namespace std;

int main() 
{
    xt::xtensor a = xt::random::rand({1000000});

    auto start = std::chrono::high_resolution_clock::now();    
    a = xt::where(a > 0.5, 1.0, a);
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = duration_cast(stop - start);
    cout << duration.count() << endl;
}

On my system between 2500 and 5000 microseconds (much more distributed than for NumPy) with xsimd, and about twice as long without xsimd.

4. xtensor filter

#include 
#include 
#include 

using namespace std;

int main() 
{
    xt::xtensor a = xt::random::rand({1000000});

    auto start = std::chrono::high_resolution_clock::now();    
    xt::filter(a, a > 0.5) = 1.0;
    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = duration_cast(stop - start);
    cout << duration.count() << endl;
}

On my system about 30000 microsconds with and without xsimd.

Compilation

I use

cmake_minimum_required(VERSION 3.1)

project(Run)

set(CMAKE_BUILD_TYPE Release)

find_package(xtensor REQUIRED)
find_package(xsimd REQUIRED)
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} xtensor xtensor::optimize xtensor::use_xsimd)

without xsimd I omit the last line.

Rosetta / native

I am running Mac's M1. The timings listed are on Rosetta (i.e. x86). For native build the timings are:

4500 microseconds.
1500 microseconds.
2000 microseconds with and without xsimd (I think xsimd simply doesn't yet work on that chip!).
15000 microseconds.

xtensor equivalent of numpy a[a>3] = 1

Answers (1)

1. NumPy indexing:

2. NumPy where

3. xtensor where

4. xtensor filter

Compilation

Rosetta / native

Related Questions

xtensor equivalent of numpy a[a&gt;3] = 1

Answers (1)

1. NumPy indexing:

2. NumPy where

3. xtensor where

4. xtensor filter

Compilation

Rosetta / native

Related Questions

xtensor equivalent of numpy a[a>3] = 1