Python: How to find the a unique element pattern from 2 arrays?

Question

I have two numpy arrays, A and B:

A = ([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = ([2, 3, 1, 2])

where B is a unique pattern within A.

I need the output to be all the elements of A, which aren't present in B.

Output = ([1, 2, 3, 1, 3])

Dima Tisnek · Accepted Answer

Easiest is to use Python's builtins, i.e. string type:

A = "123231213"
B = "2312"
result = A.replace(B, "")

To efficiently convert numpy.array to an from str, use these functions:

x = numpy.frombuffer("3452353", dtype="|i1")
x
array([51, 52, 53, 50, 51, 53, 51], dtype=int8)
x.tostring()
"3452353"

(*) thus mixes up ascii codes (1 != "1"), but substring search will work just fine. Your data type should better fit in one char, or you may get a false match.

To sum it up, a quick hack looks like this:

A = numpy.array([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = numpy.array([2, 3, 1, 2])
numpy.fromstring(A.tostring().replace(B.tostring(), ""), dtype=A.dtype)
array([1, 2, 3, 1, 3])
# note, here dtype is some int, I'm relying on the fact that:
# "1 matches 1" is equivalent to "0001 matches 00001"
# this holds as long as values of B are typically non-zero.
#
# this trick can conceptually be used with floating point too,
# but beware of multiple floating point representations of same number

In depth explanation:

Assuming size of A and B is arbitrary, naive approach runs in quadratic time. However better, probabilistic algorithms exit, for example Rabin-Karp, which relies on sliding window hash.

Which is the main reason text oriented functions, such as xxx in str or str.replace or re will be much faster than custom numpy code.

If you truly need this function to be integrated with numpy, you can always write an extension, but it's not easy :)

Python: How to find the a unique element pattern from 2 arrays?

Answers (1)

Related Questions