LN3
LN3

Reputation: 67

How to convert different numpy arrays to sets?

I have one numpy array that looks like this:

array([  0,   1,   2,   6,   8,   9,  10,  11,  12,  13,  14,  15,  16,
        18,  19,  20,  22,  27,  28,  29,  32,  33,  34,  36,  37,  38,
        39,  42,  43,  44,  45,  47,  48,  51,  52,  54,  55,  56,  60,
        65,  66,  67,  68,  69,  70,  71,  73,  74,  75,  77,  78,  80,
        81,  82,  83,  84,  85,  86,  87,  88,  89,  92,  94,  95,  97,
        98, 100, 101, 102, 105, 106, 108, 109, 113, 114, 117, 118, 119,
       121, 123, 124, 126, 127, 128, 129, 131, 132, 133, 134, 135, 137,
       138, 141, 142, 143, 144, 145, 147, 148, 149, 152, 154, 156, 157,
       159, 160, 161, 163, 165, 166, 167, 168, 169, 170, 172, 176, 177,
       179, 180, 182, 183, 185, 186, 187, 188, 191, 192, 194, 196, 197,
       199, 200, 201, 202, 204, 205, 206, 207, 208])

I'm able to convert this to a set using set() no problem

However, I have another numpy array that looks like:

array([[  2],
       [  4],
       [ 10],
       [ 10],
       [ 12],
       [ 13],
       [ 14],
       [ 16],
       [ 19],
       [ 21],
       [ 21],
       [ 22],
       [ 29],
       [209]]) 

When I try to use set() this gives me an error: TypeError: unhashable type: 'numpy.ndarray'

How can I convert my second numpy array to look like the first array and so I will be able to use set()?

For reference my second array is converted from a PySpark dataframe column using:

np.array(data2.select('row_num').collect())

And both arrays are used with set() in:

count = sorted(set(range(data1)) - set(np.array(data2.select('row_num').collect())))

Upvotes: 1

Views: 34

Answers (1)

flakes
flakes

Reputation: 23624

As mentioned, use ravel to return a contiguous flattened array.

import numpy as np

arr = np.array(
    [[2], [4], [10], [10], [12], [13], [14], [16], [19], [21], [21], [22], [29], [209]]
)

print(set(arr.ravel()))

Outputs:

{2, 4, 10, 12, 13, 14, 16, 209, 19, 21, 22, 29}

This is somewhat equivalent to doing a reshape with a single dimension being the array size:

print(set(arr.reshape(arr.size)))

Upvotes: 1

Related Questions