StatsSorceress
StatsSorceress

Reputation: 3099

Python: using setdiff to assign to a numpy array

I have a numpy array:

>>> n1 = np.array([1, 1, 2, 1, 4, 5, 3, 8, 2, 9, 9])

From this, I can get the number of elements from the beginning up to the highest value before the next lower number begins begins like this:

>>> wherediff = np.where(n1[1:]-n1[:-1] < 0)
>>> wherediff = wherediff[0] + 1
>>> wherediff
array([3, 6, 8])

I can insert a 0 at the beginning of this array:

>>> wherediff = np.insert(wherediff, 0, 0)
>>> wherediff
array([0, 3, 6, 8])

And I can get the number of elements between each successive value:

>>> sum_vals = np.abs(wherediff[1:] - wherediff[:-1])
>>> sum_vals
array([3, 3, 2])

Now, I want to generate another numpy array with the following properties:

I tried this:

>>> n3 = []
>>> for i in range(1, wherediff.shape[0]):
...     s1 = set(range(wherediff[i]))
...     s2 = set(range(wherediff[i-1]))
...     s3 = np.setdiff1d(s1, s2)[0]
...     n3.append(np.repeat(i, len(s3)))

thinking I'd switch to an array later, but the setdiff1d function is not performing as expected. It's doing this:

>>> for i in range(1, wherediff.shape[0]):
...     s1 = set(range(wherediff[i]))
...     s2 = set(range(wherediff[i-1]))
...     s3 = np.setdiff1d(s1, s2)[0]
...     print(s3)
...
set([0, 1, 2])
set([0, 1, 2, 3, 4, 5])
set([0, 1, 2, 3, 4, 5, 6, 7])

whereas I would want;

0 1 2
3 4 5
6 7
8, 9, 10

Any ideas?

Upvotes: 0

Views: 591

Answers (2)

user2357112
user2357112

Reputation: 281046

Skip all the setdiff1d stuff and the index manipulation and work with an array of booleans:

flags = n1[1:] < n1[:-1]
flags = np.insert(flags, 0, True)

result = np.cumsum(flags)

The cumsum adds 1 to the sum for every True, so once for the first element and once for every time an element of n1 was less than the previous.

Upvotes: 2

Julien
Julien

Reputation: 15071

If you are using native python sets you may as well do the diff operation without numpy:

wherediff = np.array([0, 3, 6, 8])

for i in range(1, wherediff.shape[0]):
    s1 = set(range(wherediff[i]))
    s2 = set(range(wherediff[i-1]))
    s3 = np.array(list(s1 - s2))
    print(s3)

If you want to do everything in numpy then this is the way:

for i in range(1, wherediff.shape[0]):
    s1 = np.array(range(wherediff[i]))
    s2 = np.array(range(wherediff[i-1]))
    s3 = s3 = np.setdiff1d(s1, s2)
    print(s3)

Note that you can use assume_unique=True here...

Upvotes: 0

Related Questions