Shuffle in one dimension of a matrix(effeciently)?

Question

I was trying to write a function that gets a matrix of 2D points and a probability p and change or swap each points coordinates with probability p

So I asked a question and I was trying to use a binary sequence as an array of the powers of a specific matrix swap_matrix=[[0,1],[1,0]] to swap randomly (with a specific proportion) the coordinates of a given set of 2D points. However I realised that power function only accepts integer values and not arrays. And shuffle is as I can understand for the whole matrix and you cannot specify a specific dimension.

Having either of these two functions is OK.

For example:

swap(a=[[1,2],[2,3],[3,4],[3,5],[5,6]],b=[0,0,0,1,1])

should return [[1,2],[2,3],[3,4],[5,3],[6,5]]

The idea that just popped up and now I am editing is:

def swap(mat,K,N):
    #where K/N is the proportion and K and N are natural numbers
    #mat is a N*2 matrix that I am planning to randomly changes 
    #it coordinates of each row or keep it as it is
    a=[[[0,1],[1,0]]]
    b=[[[1,0],[0,1]]]
    a=np.repeat(a,K,axis=0)
    b=np.repeat(b,N-K,axis=0)
    out=np.append(a,b,axis=0)
    np.random.shuffle(out)
    return np.multiply(mat,out.T)

Where I get an error cause I cannot flatten only once to make the matrices multipliable!

Again I am looking for an efficient method(vectorized in Matlab context).

P.S. In my special case the matrix is in the shape (N,2) and with the second column as all ones if that would help.

user2379410 · Accepted Answer

Maybe this is good enough for your purposes. In a quick test it appears to be about 13x faster than the blunt for-loop approach (@Naji, posting your "inefficient" code is helpful for making a comparison).

Edited my code following Jaime's comment

def swap(a, b):
    a = np.copy(a)
    b = np.asarray(b, dtype=np.bool)
    a[b] = a[b, ::-1]  # equivalent to: a[b] = np.fliplr(a[b])
    return a

# the following is faster, but modifies the original array
def swap_inplace(a, b):
    b = np.asarray(b, dtype=np.bool)
    a[b] = a[b, ::-1]


print swap(a=[[1,2],[2,3],[3,4],[3,5],[5,6]],b=[0,0,0,1,1])

Outputs:

[[1 2]
 [2 3]
 [3 4]
 [5 3]
 [6 5]]

Edit to include more detailed timings

I wanted to know if I could speed this up still with Cython, so I investigated the efficiency some more :-) The results are worth mentioning I think (since efficiency is part of the actual question), but I do appologize in advance for the amount of additional code.

First the results.. The "cython" function is clearly the fastest of all, another 10x faster than the proposed Numpy solution above. The "blunt loop approach" I mentioned is given by the function named "loop", but as it turns out there are much faster methods conceivable. My pure Python solution is only 3x slower than the vectorized Numpy code above! Another thing to note is that "swap_inplace" was most of the time only marginally faster than "swap". Also the timings vary a bit with different random matrices a and b... So now you know :-)

function     | milisec | normalized
-------------+---------+-----------
loop         | 184     | 10.
double_loop  |  84     |  4.7
pure_python  |  51     |  2.8
swap         |  18     |  1
swap_inplace |  17     |  0.95
cython       | 1.9     |  0.11

And the rest of code I used (it seems I took this way to seriously :P):

def loop(a, b):
    a_c = np.copy(a)
    for i in xrange(a.shape[0]):
        if b[i]:
            a_c[i,:] = a[i, ::-1]

def double_loop(a, b):
    a_c = np.copy(a)
    n, m = a_c.shape
    for i in xrange(n):
        if b[i]:
            for j in xrange(m):
                a_c[i, j] = a[i, m-j-1]
    return a_c

from copy import copy
def pure_python(a, b):
    a_c = copy(a)
    n, m = len(a), len(a[0])
    for i in xrange(n):
        if b[i]:
            for j in xrange(m):
                a_c[i][j] = a[i][m-j-1]
    return a_c

import pyximport; pyximport.install()
import testcy
def cython(a, b):
    return testcy.swap(a, np.asarray(b, dtype=np.uint8))

def rand_bin_array(K, N):
    arr = np.zeros(N, dtype=np.bool)
    arr[:K]  = 1
    np.random.shuffle(arr)
    return arr

N = 100000
a = np.random.randint(0, N, (N, 2))
b = rand_bin_array(0.33*N, N)

# before timing the pure python solution I first did:
a = a.tolist()
b = b.tolist()


######### In the file testcy.pyx #########

#cython: boundscheck=False
#cython: wraparound=False

import numpy as np
cimport numpy as np

def swap(np.ndarray[np.int_t, ndim=2] a, np.ndarray[np.uint8_t, ndim=1] b):
    cdef np.ndarray[np.int_t, ndim=2] a_c
    cdef int n, m, i, j
    a_c = a.copy()
    n = a_c.shape[0]
    m = a_c.shape[1]
    for i in range(n):
        if b[i]:
            for j in range(m):
                a_c[i, j] = a[i, m-j-1]
    return a_c

Shuffle in one dimension of a matrix(effeciently)?

Answers (1)

Related Questions