Reputation: 791
I have two strings: one a word and one a scramble of letters. I want to see if this scramble of letters holds enough letters to spell the word. I have come up with an algorithm to do this but it isn't efficient enough and I was hoping I could get some help making it faster.
Here's what I have so far:
s1 = 'hypochondriac'
s2 = 'yqhpwoewnqlchpijcdrxpoa'
temp = list(s1)
for X in s2:
for Y in temp:
if X == Y:
temp.remove(X)
X = '@'
if temp == []:
print('Found ', s1)
I have an issue where once X matches I need to increment X but I didn't know how so I just take it out of the equation by making it the at symbol. I tried using break but it doesn't reach far enough to break the to the s2 loop. Either way, I'm pretty sure this double for loop idea is super slow compared to what someone with some experience would use. Any ideas?
Upvotes: 0
Views: 1306
Reputation: 30258
Extending @Martijn_Pieters solution you can use Counter
in this way:
from collections import Counter
def contains(s1, s2):
c1, c2 = Counter(s1), Counter(s2)
return all(c1[c] <= c2[c] for c in s1)
You can rely on the fact Counter[key]
will default to returning 0 if key
doesn't exist.
Upvotes: 2
Reputation: 221574
Here's a vectorized approach using NumPy
-
import numpy as np
def in_string(s1,s2):
arr1 = np.fromstring(s1, dtype=np.uint8)
arr2 = np.fromstring(s2, dtype=np.uint8)
return np.in1d(arr1,arr2).all()
Sample run -
In [50]: in_string('hypochondriac','yqhpwoewnqlchpijcdrxpoa')
Out[50]: True
# Let's add in a `z` at the end of first word which isn't in the scramble
In [51]: in_string('hypochondriacz','yqhpwoewnqlchpijcdrxpoa')
Out[51]: False
Here's another NumPy based one using np.searchsorted
-
def in_string_v2(s1,s2):
arr1 = np.fromstring(s1, dtype=np.uint8)
arr2 = np.fromstring(s2, dtype=np.uint8)
u1 = np.unique(arr1)
u2 = np.unique(arr2)
return ~(np.searchsorted(u2,u1) == np.searchsorted(u2,u1,'right')).any()
Here's another one that processes a list of words in one go against one scramble at a time -
def in_string_v3(list_s1,s2):
l_arr1 = np.fromstring("".join(list_s1), dtype=np.uint8)
arr2 = np.fromstring(s2, dtype=np.uint8)
lens = np.array(map(len,list_s1))
comp_lens = np.in1d(l_arr1,arr2).cumsum()[lens.cumsum()-1]
calc_lens = np.append(comp_lens[0],comp_lens[1:]-comp_lens[:-1])
return lens == calc_lens
Sample run -
In [185]: ls1 = ['hypochondriac','hypochondriachsdhsahdsadhsa','hihfheifheozz']
In [186]: s2 = 'yqhpwoewnqlchpijcdrxpoadjksdgdkjsfkbdsfbdsdsaduiawyei'
In [187]: in_string_v3(ls1,s2)
Out[187]: array([ True, True, False], dtype=bool)
One more to process a list of words in a different way -
def in_string_v4(list_s1,s2):
l_arr1 = np.fromstring("".join(list_s1), dtype=np.uint8)
arr2 = np.fromstring(s2, dtype=np.uint8)
lens = np.array(map(len,list_s1))
clens = lens.cumsum()
non_matching_idx = np.nonzero(~np.in1d(l_arr1,arr2))[0]
non_matching_grp = np.unique(clens.searchsorted(non_matching_idx))
return ~np.in1d(np.arange(len(list_s1)),non_matching_grp)
Upvotes: 0
Reputation: 1122132
Your code is not efficient, no, because you iterate in a double loop. For each letter in s1
, in the worst-case scenario (no matches) you loop over all of s2
.
Use a Counter
object instead; these act as multi-sets, where you can both test if a character is present in O(1) time and manage remaining counts:
from collections import Counter
def contains(s1, s2):
s2set = Counter(s2)
for c in s1:
count = s2set[c]
if not c:
return False
if count == 1:
del s2set[c]
else:
s2set[c] = count - 1
return True
You can also turn s1
into a multi-set, and check if the multi-set for s2
contains enough letters for each entry:
def contains(s1, s2):
s1set = Counter(s1)
s2set = Counter(s2)
for c, count in s1set.items():
if count > s2set[c]:
return False
return True
The latter can be reduced further with the all()
function, which returns False
early if any of the results it is passed is False
, True
otherwise:
def contains(s1, s2):
s2set = Counter(s2)
return all(count <= s2set[c] for c, count in Counter(s1).items())
In all these you only have to iterate over both s1
and s2
once (either directly or to produce the multi-set).
Demo of the latter:
>>> from collections import Counter
>>> def contains(s1, s2):
... s2set = Counter(s2)
... return all(count <= s2set[c] for c, count in Counter(s1).items())
...
>>> s1 = 'hypochondriac'
>>> s2 = 'yqhpwoewnqlchpijcdrxpoa'
>>> contains(s1, s2)
True
>>> contains(s1 + 'b', s2)
False
Upvotes: 3
Reputation: 42758
Do it the other way round. Remove the characters from s2
:
s1 = 'hypochondriac'
s2 = 'yqhpwoewnqlchpijcdrxpoa'
temp = list(s2)
try:
for ch in s1:
temp.remove(ch)
except ValueError:
print("not found")
else:
print("found", s1)
Upvotes: 0