Numpy find multiple strings in 2d array

Question

I'm new to Numpy and it's been a while writing python.

I'm struggeling to find multiple strings in a Numpy array which was sliced.
My data:

string0 = "part0-part1-part2-part3-part4"
string1 = "part5-part6-part9-part7-part8"
string2 = "part5-part6-part1-part8-part7"

Sliced in to each part and combined to one array again to have it all in one place.

stringsraw = np.array([[string0], [string1], [string2]])
stringssliced = np.array(np.char.split(stringsraw, sep = '-').tolist())
stringscombined = np.squeeze(np.dstack((stringsraw, stringssliced)))

Results in:

[['part0-part1-part2-part3-part4' 'part0' 'part1' 'part2' 'part3' 'part4']
 ['part5-part6-part9-part7-part8' 'part5' 'part6' 'part9' 'part7' 'part8']
 ['part5-part6-part1-part7-part8' 'part5' 'part6' 'part1' 'part8' 'part7']]

Want to find the indices of 'part1' and 'part7'

np.where((stringscombined[2] == "part1") & (stringscombined[2] == "part7"))

The result is nothing. Can anyone explain why the result is not [3,4]?

Thought there would be a nicer way to not for loop through everything.

The "whished" query/result would be:

np.where((stringscombined == "part6") & (stringscombined == "part7")) 
= array[[1,2,4]
        [2,2,5]]

any help appreciated

StupidWolf · Accepted Answer

We can first detect where the two elements will be, using np.isin:

np.isin(stringscombined,["part1","part7"])
array([[False, False,  True, False, False, False],
       [False, False, False, False,  True, False],
       [False, False, False,  True, False,  True]])

Using np.where() on this will tell us where the elements can be found. We need one more information, which is which row has both "part1" and "part7":

(np.sum(stringscombined=="part1",axis=1)>0) & (np.sum(stringscombined=="part7",axis=1)>0)

array([False, False,  True])

The above will tell us to take only indices from the 2nd row. Combining these two information into a function:

def index_A(Array,i1,i2):
    idx = (np.sum(Array==i1,axis=1)>0) & (np.sum(Array==i2,axis=1)>0)
    loc = np.where(np.isin(Array,[i1,i2]))
    hits = [np.insert(loc[1][loc[0]==i],0,i) for i in np.where(idx)[0]]
    return hits

index_A(stringscombined,"part6","part7")
[array([1, 2, 4]), array([2, 2, 5])]

Numpy find multiple strings in 2d array

Answers (2)

Related Questions