Reputation: 85
I have two numpy
arrays, A
and B
:
A = ([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = ([2, 3, 1, 2])
where B
is a unique pattern within A
.
I need the output to be all the elements of A
, which aren't present in B
.
Output = ([1, 2, 3, 1, 3])
Upvotes: 2
Views: 166
Reputation: 11781
Easiest is to use Python's builtins, i.e. string type:
A = "123231213"
B = "2312"
result = A.replace(B, "")
To efficiently convert numpy.array
to an from str
, use these functions:
x = numpy.frombuffer("3452353", dtype="|i1")
x
array([51, 52, 53, 50, 51, 53, 51], dtype=int8)
x.tostring()
"3452353"
(*) thus mixes up ascii codes (1 != "1"
), but substring search will work just fine. Your data type should better fit in one char, or you may get a false match.
To sum it up, a quick hack looks like this:
A = numpy.array([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = numpy.array([2, 3, 1, 2])
numpy.fromstring(A.tostring().replace(B.tostring(), ""), dtype=A.dtype)
array([1, 2, 3, 1, 3])
# note, here dtype is some int, I'm relying on the fact that:
# "1 matches 1" is equivalent to "0001 matches 00001"
# this holds as long as values of B are typically non-zero.
#
# this trick can conceptually be used with floating point too,
# but beware of multiple floating point representations of same number
In depth explanation:
Assuming size of A and B is arbitrary, naive approach runs in quadratic time. However better, probabilistic algorithms exit, for example Rabin-Karp, which relies on sliding window hash.
Which is the main reason text oriented functions, such as xxx in str
or str.replace
or re
will be much faster than custom numpy
code.
If you truly need this function to be integrated with numpy, you can always write an extension, but it's not easy :)
Upvotes: 2