Hans Roelofsen
Hans Roelofsen

Reputation: 735

Identify all unique combinations along the third dimension of stackd 2D numpy arrays

For 2 or more 2D integer numpy arrays stacked along axis=0, I am interested in:

  1. identifying all unique numerical combinations along the third dimension.
  2. label each combination with new numerical value ('labels')
  3. generate a new 2D array where the array values are the labels signifying the numerical value combination of the source arrays.

Sample data:

import numpy as np
arr1 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)
arr2 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)

A list of tuples of the combinations of interest can be obtained:

xx, yy = np.meshgrid(arr1, arr2, sparse=True)
combis = np.stack([xx.reshape(arr1.size), yy.reshape(arr2.size)])
u_combis = np.unique(combis, axis=1)
u_combis_lst = list(map(tuple, u_combis.T))

Generate dictionary to map each combination to a label:

labels = [x for x in range(0, len(u_combis_lst))]
label_dict = dict(zip(u_combis_lst, labels))

Now, bullet points 1 and 2 seem to be achieved. My questions are:

  1. How can I apply label_dict to arr1 and arr2 combined?
  2. How can my code suggestions be improved?
  3. How can the code made to work with > 2 arrays?

To be complete, my aim is to recreate the functionality of the 'combine' function in Arcgis Pro.

Upvotes: 1

Views: 338

Answers (2)

JohanB
JohanB

Reputation: 36

Another approach could be to create a dictionary lookup table based on the unique tuple combinations of the array values.

# start with flattened arrays
arr1 = np.random.randint(low=0, high=4, size=25)
arr2 = np.random.randint(low=0, high=4, size=25)

# create tuples and store the unique tuples
combis = list(zip(arr1, arr2)) 

u_combis = set(combis) # get unique combinations

# create a dictionary of the unique tuples with the unique values
u_combi_dict = {combi:n for n, combi in enumerate(u_combis)}

# use the unique dictionary combinations to match the tuples
combi_arr = np.array([u_combi_dict[combi] for combi in combis])

# if needed, reshape back to original extent for spatial analysis
combi_arr_grid = combi_arr.reshape(5, 5)

A generic function that can use an arbitrary number of input arrays could work as follows:

def combine(input_arrays):

    combis = list(zip(*input_arrays))
    u_combis = set(combis)

    u_combi_dict = {combi: n for n, combi in enumerate(u_combis)}
    combi_arr = np.array([u_combi_dict[combi] for combi in combis])

    return combi_arr

Upvotes: 2

Mark Setchell
Mark Setchell

Reputation: 207660

If your numbers are smallish numbers, e.g. np.uint8 (like labels in an unsupervised classification, for example), you could shift and OR the layers together into a fat 64-bit integer and do your combining with that - which would allow you to combine up to 8 np.uint8 layers or 4 np.int16 layers, for example.

#!/usr/bin/env python3

import numpy as np

# Ensure repeatable, deterministic randomness!
np.random.seed(42)

# Generate test arrays
arr2 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)
arr1 = np.array(np.random.randint(low=0, high=4, size=25)).reshape(5,5)

# Build a FatThing by shifting and ORing arrays together, do 3 arrays with FatThing = arr1 | (arr2<<8) | (arr3(<<16)
FatThing = arr1 | (arr2<<8)

# Find unique values in FatThing
uniques = np.unique(FatThing)

# Make lookup table of labels corresponding to each fat value
FatThing2label = {uniques[i]:i for i in range(len(uniques))}

# Lookup label of each fat value
result = [FatThing2label[int(x)] for x in np.nditer(FatThing)]
result = np.array(result).reshape(arr1.shape)

That generates arr1 as:

array([[1, 1, 1, 3, 3],
       [0, 0, 3, 1, 1],
       [0, 3, 0, 0, 2],
       [2, 2, 1, 3, 3],
       [3, 3, 2, 1, 1]])

And arr2 as:

array([[2, 3, 0, 2, 2],
       [3, 0, 0, 2, 1],
       [2, 2, 2, 2, 3],
       [0, 3, 3, 3, 2],
       [1, 0, 1, 3, 3]])

Which makes FatThing look like this:

array([[513, 769,   1, 515, 515],
       [768,   0,   3, 513, 257],
       [512, 515, 512, 512, 770],
       [  2, 770, 769, 771, 515],
       [259,   3, 258, 769, 769]])

And result is this:

array([[ 8, 11,  1,  9,  9],
       [10,  0,  3,  8,  4],
       [ 7,  9,  7,  7, 12],
       [ 2, 12, 11, 13,  9],
       [ 6,  3,  5, 11, 11]])

Upvotes: 1

Related Questions