imsc
imsc

Reputation: 7840

Distance to non consecutive elements in a numpy array

Using numpy or itertools is there a efficient way to determine the distance to next non-consecutive elements.

> import numpy as np 
> a=np.array(['a','b','b','c','d','a','b','b','c','c','c','d'])

I would want the output to be.

[1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1]

Extending this, I would want the distance to two new elements. The expected output should be

[3, 3, 2, 2, 2, 3, 5, 4]

as the two new elements after a is b (two) and c, and so on.

Edit 1 I have two version for the finding the next new elements:

import numpy as np                                                           
a = np.array(['a', 'b', 'b', 'c', 'd', 'a', 'b', 'b', 'c', 'c', 'c', 'd'])  

# Using numpy
u, idx = np.unique(a, return_inverse=True)                                                      
idx = np.diff(idx)                                                      
idx[idx < 0] = 1
idx[idx > 1] = 1 
count = 1
while 0 in idx:                                                                    
    idx[np.diff(idx) == count] = count+1
    count += 1                                                                                  │                                                                                           
print idx  

# Using loop
oldElement = a[0]
dist = []
count = 1
for elm in a[1:]:
    if elm == oldElement:
        count += 1
    else:
        dist.extend(range(count, 0, -1))
        count = 1
        oldElement = elm
print dist

However this approach can not be simply extended to find 2 new elements.

Upvotes: 3

Views: 636

Answers (3)

shx2
shx2

Reputation: 64368

Unfortunately, I don't have a numpy/vectorized solution to the general problem.

Here is a general solution, which works for any depth. The first part of your question corresponds to depth=1, the second to depth=2. This solution works for higher depths as well.

Clearly, if you only want to solve the depth=1 case, one can come up with a much simpler solution. However, for this problem, generality adds complexity.

from itertools import groupby, chain, izip

ilen = lambda it: sum(1 for dummy in it)

def get_squeezed_counts(a):
    """
    squeeze a sequence to a sequnce of value/count.
    E.g. ['a', 'a', 'a', 'b'] --> [['a',3], ['b',1]]
    """
    return [ [ v, ilen(it) ] for v, it in groupby(a) ]

def get_element_dist(counts, index, depth):
    """
    For a given index in a "squeezed" list, return the distance (in the
    original-array) with a given depth, or None.
    E.g.
    get_element_dist([['a',1],['b',2],['c',1]], 0, depth=1) --> 1     # from a to first b
    get_element_dist([['a',1],['b',2],['c',1]], 1, depth=1) --> 2     # from first b to c
    get_element_dist([['a',1],['b',2],['c',1]], 0, depth=2) --> 3     # from a to c
    get_element_dist([['a',1],['b',2],['c',1]], 1, depth=2) --> None  # from first b to end of sequence
    """
    seen = set()
    sum_counts = 0
    for i in xrange(index, len(counts)):
        v, count = counts[i]
        seen.add(v)
        if len(seen) > depth:
            return sum_counts
        sum_counts += count
    # reached end of sequence before finding the next value
    return None

def get_squeezed_dists(counts, depth):
    """
    Construct a per-squeezed-element distance list, by calling get_element_dist()
    for each element in counts.
    E.g.
    get_squeezed_dists([['a',1],['b',2],['c',1]], depth=1) --> [1,2,None]
    """
    return [ get_element_dist(counts, i, depth=depth) for i in xrange(len(counts)) ]

def get_dists(a, depth):
    counts = get_squeezed_counts(a)
    squeezed_dists = get_squeezed_dists(counts, depth=depth)
    # "Unpack" squeezed dists:
    return list(chain.from_iterable(
        xrange(dist, dist-count, -1)
        for (v, count), dist in izip(counts, squeezed_dists)
        if dist is not None
    ))

print get_dists(['a','b','b','c','d','a','b','b','c','c','c','d'], depth = 1)
# => [1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1]
print get_dists(['a','a','a'], depth = 1)
# => []
print get_dists(['a','b','b','c','d','a','b','b','c','c','c','d'], depth = 2)
# => [3, 3, 2, 2, 2, 3, 5, 4]
print get_dists(['a','b','a', 'b'], depth = 2)
# => []

For python3, replace xrange->range and izip->zip.

Upvotes: 1

eickenberg
eickenberg

Reputation: 14377

Here is an idea how to vectorize this if there aren't too many unique elements. It may not be sufficiently general and only actually solve your problem sequence a :) (I played around with the arrays until it worked). The while loop at the end should be something that one can optimize:

import numpy as np
a = np.array(['a','b','b','c','d','a','b','b','c','c','c','d'])

aa = a[:, np.newaxis] == np.unique(a)
aaa = np.cumsum(aa[::-1], axis=0)[::-1] * aa

# this is where it gets messy

negative_jump = True
while negative_jump:
    d = np.diff(aaa, axis=0)
    correction = (d + 1) * (d < -1)
    negative_jump = (correction != 0).any()
    aaa[:-1] += correction
result = aaa[:-1].sum(axis=1)

Explanation: Take a look at aaa before the loop. It will contain numbers that are far from 0. In this view on the data it is necessary that passing from one line to another the decrements are never < -1. If they are, the number above was too big. The loop decreases it until it is -1 or 0. Again, not optimal, one can definitely do better.

Upvotes: 0

Cory Kramer
Cory Kramer

Reputation: 118011

This is my attempt at distance of one element.

import numpy as np
a=np.array(['a','b','b','c','d','a','b','b','c','c','c','d'])
out = []
for i in range(len(a)):
    count = 0
    for j in range(len(a) - i):
        if a[i] != a[j+i]:
            out.append(count)
            break
        else:
            count += 1

Result

>>> out
[1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1]

Upvotes: 0

Related Questions