Reputation: 67
I have one numpy array that looks like this:
array([ 0, 1, 2, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16,
18, 19, 20, 22, 27, 28, 29, 32, 33, 34, 36, 37, 38,
39, 42, 43, 44, 45, 47, 48, 51, 52, 54, 55, 56, 60,
65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 77, 78, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 94, 95, 97,
98, 100, 101, 102, 105, 106, 108, 109, 113, 114, 117, 118, 119,
121, 123, 124, 126, 127, 128, 129, 131, 132, 133, 134, 135, 137,
138, 141, 142, 143, 144, 145, 147, 148, 149, 152, 154, 156, 157,
159, 160, 161, 163, 165, 166, 167, 168, 169, 170, 172, 176, 177,
179, 180, 182, 183, 185, 186, 187, 188, 191, 192, 194, 196, 197,
199, 200, 201, 202, 204, 205, 206, 207, 208])
I'm able to convert this to a set using set()
no problem
However, I have another numpy array that looks like:
array([[ 2],
[ 4],
[ 10],
[ 10],
[ 12],
[ 13],
[ 14],
[ 16],
[ 19],
[ 21],
[ 21],
[ 22],
[ 29],
[209]])
When I try to use set()
this gives me an error: TypeError: unhashable type: 'numpy.ndarray'
How can I convert my second numpy array to look like the first array and so I will be able to use set()
?
For reference my second array is converted from a PySpark dataframe column using:
np.array(data2.select('row_num').collect())
And both arrays are used with set()
in:
count = sorted(set(range(data1)) - set(np.array(data2.select('row_num').collect())))
Upvotes: 1
Views: 34
Reputation: 23624
As mentioned, use ravel to return a contiguous flattened array.
import numpy as np
arr = np.array(
[[2], [4], [10], [10], [12], [13], [14], [16], [19], [21], [21], [22], [29], [209]]
)
print(set(arr.ravel()))
Outputs:
{2, 4, 10, 12, 13, 14, 16, 209, 19, 21, 22, 29}
This is somewhat equivalent to doing a reshape with a single dimension being the array size:
print(set(arr.reshape(arr.size)))
Upvotes: 1