CIsForCookies
CIsForCookies

Reputation: 12837

find index of element corresponding to another list

I have a list of strings: ls = ['a','b','c'] and another one, with larger strings, guaranteed to include one and only one strings from ls: ls2 = ['1298a', 'eebbbd', 'qcqcq321']".

How can I find, for a given string from ls2, what is the index of the corresponding string from ls?

I can use:

for s in ls:
    for ss in ls2:
        if s in ss:
            print (s,ss,ls.index(s))

a 1298a 0
b eebbbd 1
c qcqcq321 2

but it there something nicer?

EDIT (hope it clarifies):

The actual case I'm working on has a bigger 1st list, and a smaller 2nd:

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

and I want to get the result 0,3 because the 1st item in ls2 has the 1st item from ls, while the 2nd in ls2 has the 4th in ls

Upvotes: 0

Views: 84

Answers (5)

Mad Physicist
Mad Physicist

Reputation: 114548

It doesn't look like you can get away from O(m * n * p) complexity (where m = len(ls), n = len(ls2), p = max(map(len, ls2))) without further information about your data. You can definitely reduce your current loop from O(m2 * n * p) by keeping track of the current index using enumerate. Also, don't forget about early termination:

for string in ls2:
    for index, key in enumerate(ls):
        if key in string:
            print(key, string, index)
            break

Notice that I swapped the inner and outer loop to make the break work properly: you definitely want to check each element of ls2, but only the minimum number of elements in ls.

Here are some timings I accumulated on the different O(m * n * p) solutions presented here. Thanks to @thierry-lathuille for the test data:

ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof',
       'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]

def with_table():
    table = {key: index for index, key in enumerate(ls)}
    result = {}
    for string in ls2:
        for key in ls:
            if key in string:
                result[string] = table[key]
    return result

def with_enumerate():
    result = {}
    for string in ls2:
        for index, key in enumerate(ls):
            if key in string:
                result[string] = index
                break
    return result

def with_dict_comp():
    return {string: index for string in ls2 for index, key in enumerate(ls) if key in string}

def with_itertools():
    result = {}
    for (index, key), string in itertools.product(enumerate(ls), ls2):
        if key in string:
            result[string] = index
    return result

%timeit with_table()
4.89 µs ± 61.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_enumerate()
5.27 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_dict_comp()
6.9 µs ± 83.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit with_itertools()
17.5 ns ± 0.193 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

As it turns out, creating a lookup table for the indices is slightly faster than computing them on the fly with enumerate.

Upvotes: 2

Thierry Lathuille
Thierry Lathuille

Reputation: 24289

You could use these two improvements:

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']

# preprocess ls to avoid calling ls.index each time:
indices = {ss:index for index, ss in enumerate(ls)}

for s in ls2:
    for ss in ls:
        if ss in s:
            print(ss, s, indices[ss])
            # as s is guaranteed to include only one of the substrings,
            # we don't have to test the other substrings once we found a match
            break


# a 1298a 0
# b eebbbd 1
# c qcqcq321 2     

Some timings:

Breaking out of the loop once the match has been found always improves the speed. The overhead due to the creation of the dict of indices makes it slower for very small lists, but is already faster with lists shorter than the ones used in the timings:

ls = ['g', 'j', 'z', 'a', 'rr', 'ttt', 'b', 'c', 'd', 'f']
ls2 = ['1298a', 'eebbb', 'qcqcq321', 'mlkjmd', 'dùmlk', 'lof', 'erreee', 'bmw', 'ottt', 'jllll', 'lla' ]

def original():
    for s in ls:
        for ss in ls2:
            if s in ss:
                out = (s,ss,ls.index(s))


def with_break():

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, ls.index(ss))
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break


def with_break_and_dict():
    # preprocess ls to avoid calling ls.index each time:
    indices = {ss:index for index, ss in enumerate(ls)}

    for s in ls2:
        for ss in ls:
            if ss in s:
                out = (ss, s, indices[ss])
                # as s is guaranteed to include only one of the substrings,
                # we don't have to test the other substrings once we found a match
                break

Timing results:

%timeit original()
%timeit with_break()
%timeit with_break_and_dict()


# 100000 loops, best of 3: 12.8 µs per loop
# 100000 loops, best of 3: 9.5 µs per loop
# 100000 loops, best of 3: 8.49 µs per loop

Upvotes: 0

zezollo
zezollo

Reputation: 5027

So, if I interpret the "nicer" request as "written in a more compact way", then I suggest all loops and conditions packed in a list comprehension:

>>> ls  = ['apo','b','c','d25','egg','f','g']
>>> ls2 = ['apoip21', 'oiujohuid25']

>>> [ls.index(s) for s in ls for s2 in ls2 if s in s2]
[0, 3]

But it does not improve the complexity, if "nicer" was to be understood as "less complex"...

Upvotes: 0

Thomas Francois
Thomas Francois

Reputation: 866

Using a dict comprehension:

ls  = ['apo','b','c','d25','egg','f','g']
ls2 = ['apoip21', 'oiujohuid25']

result = { string : index for index, i in enumerate(ls) for string in ls2 if i in string }
# {'apoip21': 0, 'oiujohuid25': 3}

Upvotes: 0

Taohidul Islam
Taohidul Islam

Reputation: 5414

Your codes time complexity is O(n^4), you can make it O(n^3) by using dict.

ls = ['a','b','c']
ls2 = ['1298a', 'eebbbd', 'qcqcq321']
word_dict=dict()
for i in range(len(ls)):    #time complexity O(n)

    word_dict[ls[i]]=i
for s in ls:    #O(n)
    for ss in ls2:  #O(n)
        if s in ss: #O(n)
            print(s,ss,word_dict[s]) #O(1)

Upvotes: 0

Related Questions