user1362058
user1362058

Reputation: 791

Efficiently check if one string contain all character of another

I have two strings: one a word and one a scramble of letters. I want to see if this scramble of letters holds enough letters to spell the word. I have come up with an algorithm to do this but it isn't efficient enough and I was hoping I could get some help making it faster.

Here's what I have so far:

s1 = 'hypochondriac'
s2 = 'yqhpwoewnqlchpijcdrxpoa'

temp = list(s1)
for X in s2:
    for Y in temp:
        if X == Y:
            temp.remove(X)
            X = '@'
if temp == []:
    print('Found ', s1)

I have an issue where once X matches I need to increment X but I didn't know how so I just take it out of the equation by making it the at symbol. I tried using break but it doesn't reach far enough to break the to the s2 loop. Either way, I'm pretty sure this double for loop idea is super slow compared to what someone with some experience would use. Any ideas?

Upvotes: 0

Views: 1306

Answers (4)

AChampion
AChampion

Reputation: 30258

Extending @Martijn_Pieters solution you can use Counter in this way:

from collections import Counter
def contains(s1, s2):
    c1, c2 = Counter(s1), Counter(s2)
    return all(c1[c] <= c2[c] for c in s1)

You can rely on the fact Counter[key] will default to returning 0 if key doesn't exist.

Upvotes: 2

Divakar
Divakar

Reputation: 221574

Here's a vectorized approach using NumPy -

import numpy as np

def in_string(s1,s2):
    arr1 = np.fromstring(s1, dtype=np.uint8)
    arr2 = np.fromstring(s2, dtype=np.uint8)
    return np.in1d(arr1,arr2).all()

Sample run -

In [50]: in_string('hypochondriac','yqhpwoewnqlchpijcdrxpoa')
Out[50]: True

# Let's add in a `z` at the end of first word which isn't in the scramble
In [51]: in_string('hypochondriacz','yqhpwoewnqlchpijcdrxpoa')
Out[51]: False

Here's another NumPy based one using np.searchsorted -

def in_string_v2(s1,s2):
    arr1 = np.fromstring(s1, dtype=np.uint8)
    arr2 = np.fromstring(s2, dtype=np.uint8)
    u1 = np.unique(arr1)
    u2 = np.unique(arr2)
    return ~(np.searchsorted(u2,u1) == np.searchsorted(u2,u1,'right')).any()

Here's another one that processes a list of words in one go against one scramble at a time -

def in_string_v3(list_s1,s2):
    l_arr1 = np.fromstring("".join(list_s1), dtype=np.uint8)
    arr2 = np.fromstring(s2, dtype=np.uint8)
    lens = np.array(map(len,list_s1))
    comp_lens = np.in1d(l_arr1,arr2).cumsum()[lens.cumsum()-1]
    calc_lens = np.append(comp_lens[0],comp_lens[1:]-comp_lens[:-1])
    return lens == calc_lens

Sample run -

In [185]: ls1 = ['hypochondriac','hypochondriachsdhsahdsadhsa','hihfheifheozz']

In [186]: s2 = 'yqhpwoewnqlchpijcdrxpoadjksdgdkjsfkbdsfbdsdsaduiawyei'

In [187]: in_string_v3(ls1,s2)
Out[187]: array([ True,  True, False], dtype=bool)

One more to process a list of words in a different way -

def in_string_v4(list_s1,s2):
    l_arr1 = np.fromstring("".join(list_s1), dtype=np.uint8)
    arr2 = np.fromstring(s2, dtype=np.uint8)
    lens = np.array(map(len,list_s1))
    clens = lens.cumsum()
    non_matching_idx = np.nonzero(~np.in1d(l_arr1,arr2))[0]
    non_matching_grp = np.unique(clens.searchsorted(non_matching_idx))
    return ~np.in1d(np.arange(len(list_s1)),non_matching_grp)

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1122132

Your code is not efficient, no, because you iterate in a double loop. For each letter in s1, in the worst-case scenario (no matches) you loop over all of s2.

Use a Counter object instead; these act as multi-sets, where you can both test if a character is present in O(1) time and manage remaining counts:

from collections import Counter

def contains(s1, s2):
    s2set = Counter(s2)
    for c in s1:
        count = s2set[c]
        if not c:
            return False
        if count == 1:
            del s2set[c]
        else:
            s2set[c] = count - 1
    return True

You can also turn s1 into a multi-set, and check if the multi-set for s2 contains enough letters for each entry:

def contains(s1, s2):
    s1set = Counter(s1)
    s2set = Counter(s2)
    for c, count in s1set.items():
        if count > s2set[c]:
            return False
    return True

The latter can be reduced further with the all() function, which returns False early if any of the results it is passed is False, True otherwise:

def contains(s1, s2):
    s2set = Counter(s2)
    return all(count <= s2set[c] for c, count in Counter(s1).items())

In all these you only have to iterate over both s1 and s2 once (either directly or to produce the multi-set).

Demo of the latter:

>>> from collections import Counter
>>> def contains(s1, s2):
...     s2set = Counter(s2)
...     return all(count <= s2set[c] for c, count in Counter(s1).items())
...
>>> s1 = 'hypochondriac'
>>> s2 = 'yqhpwoewnqlchpijcdrxpoa'
>>> contains(s1, s2)
True
>>> contains(s1 + 'b', s2)
False

Upvotes: 3

Daniel
Daniel

Reputation: 42758

Do it the other way round. Remove the characters from s2:

s1 = 'hypochondriac'
s2 = 'yqhpwoewnqlchpijcdrxpoa'

temp = list(s2)
try:
    for ch in s1:
        temp.remove(ch)
except ValueError:
    print("not found")
else:
    print("found", s1)

Upvotes: 0

Related Questions