user61871
user61871

Reputation: 1009

Finding indices of matches between a list and multiple sublists in Python

Needle: ['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']

Haystack: [['', '', 'yes', 'yes', '', '', 'yes', 'yes', '', 'yes'], ['', '', '', 'yes', 'yes', '', '', '', 'yes', 'yes']]

Needle matches with Haystack[0] at 2,6,7 and matches with Haystack[1] at 8, I'd like to be able to create these match lists of indices.

Currently: my code returns [1,2,6,7,8], and doesn't tell me where the matches are... not sure why it finds a match at 1:

for sublist in (haystack):
print(needle)
print(sublist)
print([i for i, item in enumerate(needle) if item in sublist and item != ''])

and my output looks like

['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
['', '', 'yes', 'yes', '', '', 'yes', 'yes', '', 'yes']
[1, 2, 6, 7, 8]
['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
['', '', '', 'yes', 'yes', '', '', '', 'yes', 'yes']
[1, 2, 6, 7, 8]

Full reproducible:

needle = ['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
haystack = [['', '', 'yes', 'yes', '', '', 'yes', 'yes', '', 'yes'], ['', '', '', 'yes', 'yes', '', '', '', 'yes', 'yes']]`

for sublist in (haystack):
    print(needle)
    print(sublist)
    print([i for i, item in enumerate(needle) if item in sublist and item != ''])

Upvotes: 1

Views: 116

Answers (4)

Yaniv
Yaniv

Reputation: 829

If I understand you correctly, then you're looking for a logical AND between the arrays, where "yes" is 1 and "" is 0.

So if we first convert your data to binary: (of course, you can skip this paragraph and assume we have binary data in the first place...)

import numpy as np
def convert_to_binary(arr):
  return 1 * (np.array(arr) == 'yes')
needle = convert_to_binary(needle)
# array([0, 1, 1, 0, 0, 0, 1, 1, 1, 0])
haystack = np.array([convert_to_binary(h_arr) for h_arr in haystack])
# array([[0, 0, 1, 1, 0, 0, 1, 1, 0, 1],
#        [0, 0, 0, 1, 1, 0, 0, 0, 1, 1]])

Their logical AND:

their_logical_and = needle & haystack
# array([[0, 0, 1, 0, 0, 0, 1, 1, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])

To achieve the non-zero indices, can use numpy.nonzero:

indices = [list(np.nonzero(arr)[0]) for arr in their_logical_and]
# [[2, 6, 7], [8]]

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95993

Use enumerate and zip:

for sublist in haystack:
    print(needle)
    print(sublist)
    print([i for i, (x, y) in enumerate(zip(needle, sublist)) if x and y and x == y])

Output:

['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
['', '', 'yes', 'yes', '', '', 'yes', 'yes', '', 'yes']
[2, 6, 7]
['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
['', '', '', 'yes', 'yes', '', '', '', 'yes', 'yes']
[8]

Upvotes: 2

Abhisek Roy
Abhisek Roy

Reputation: 584

What you are looking for-

needle = ['', 'yes', 'yes', '', '', '', 'yes', 'yes', 'yes', '']
haystack = [['', '', 'yes', 'yes', '', '', 'yes', 'yes', '', 'yes'],
['', '', '', 'yes', 'yes', '', '', '', 'yes', 'yes']]

for sublist in (haystack):
    print(needle)
    print(sublist)
    print([i for i, item in enumerate(needle) if item == sublist[i] and item != ''])

Upvotes: 1

user61871
user61871

Reputation: 1009

As TemporalWolf pointed out, I was enumerating the wrong thing... the following works!

for sublist in (haystack):
    print([i for i, item in enumerate(sublist) if needle[i]=='yes' and sublist[i]=='yes'])

Upvotes: 1

Related Questions