Reputation: 10511
Python has string.find()
and string.rfind()
to get the index of a substring in a string.
I'm wondering whether there is something like string.find_all()
which can return all found indexes (not only the first from the beginning or the first from the end).
For example:
string = "test test test test"
print string.find('test') # 0
print string.rfind('test') # 15
#this is the goal
print string.find_all('test') # [0,5,10,15]
For counting the occurrences, see Count number of occurrences of a substring in a string.
Upvotes: 603
Views: 855453
Reputation: 10801
All the answers so far imply inefficient solutions that take O(n*m)
time, where n
is the "haystack" length and m
is the "needle" length. Though I'm not sure whether it's true for the regular expression solution.
The problem can be solved in an O(n+m)
time using a Knuth–Morris–Pratt algorithm version that doesn't stop after an occurrence is found:
# Not necessary to comprehend, just copy to your code
def findAll(haystack: str, needle: str):
n = len(haystack)
m = len(needle)
# Key - needle prefix length,
# Value - the length of the longest other needle prefix that is also a suffix of this prefix.
longestPrefixSuffix = [0] * m
length = 0
suffixEnd = 1 # Last index
while suffixEnd < m - 1:
if needle[length] == needle[suffixEnd]:
length += 1
suffixEnd += 1
longestPrefixSuffix[suffixEnd] = length
elif length > 0:
# Since needle[0:length] == needle[suffixEnd-length:suffixEnd],
# needle[0:longestPrefixSuffix[length]] == needle[suffixEnd-longestPrefixSuffix[length]:suffixEnd]
length = longestPrefixSuffix[length]
# Try to continue the equal substrings with the shorter prefix
suffixEnd += 1
i = 0 # haystack index
j = 0 # needle index
while i <= n - m:
if haystack[i + j] == needle[j]:
if j + 1 < m:
j += 1
yield i
if j > 0:
# Move i to the end of the compared region,
# unless a part of needle is a prefix of needle
i = i + j - longestPrefixSuffix[j]
j = longestPrefixSuffix[j]
i += 1
j = 0
print(list(findAll("test test test test", "test"))) # [0, 5, 10, 15]
This algorithm is used inside the built-in find
method. I wish the findAll
function is also built-in.
Upvotes: 0
Reputation: 86323
Here's a (very inefficient) way to get all (i.e. even overlapping) matches:
>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]
This solution also works for multi-word subwords.
s = "Find THIS SUB-WORD in this sentence with THIS SUB-WORD"
[i for i in range(len(s)) if s.startswith(sub, I)]
# [5, 41]
Upvotes: 90
Reputation: 418
I think the most clean way of solution is without libraries and yields:
def find_all_occurrences(string, sub):
index_of_occurrences = []
current_index = 0
while True:
current_index = string.find(sub, current_index)
if current_index == -1:
return index_of_occurrences
current_index += len(sub)
find_all_occurrences(string, substr)
Note: find()
method returns -1
when it can't find anything
Upvotes: 3
Reputation: 27
Try this it worked for me !
x=input('enter the string')
y=input('enter the substring')
while z!=r:
print(z,r,end=' ')
Upvotes: -1
Reputation: 81
To find all the occurence of a character in a give string and return as a dictionary eg: hello result : {'h':1, 'e':1, 'l':2, 'o':1}
def count(string):
result = {}
for i in string:
result[i] = string.count(i)
return result
return {}
or else you do like this
from collections import Counter
def count(string):
return Counter(string)
Upvotes: -1
Reputation: 143
Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):
string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]
[0, 5, 10, 15]
Upvotes: 2
Reputation: 319
if you want to use without re(regex) then:
find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]
string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
Upvotes: 2
Reputation: 374
You can try :
import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
Upvotes: 8
Reputation: 11
I runned in the same problem and did this:
hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []
while True:
o = hw.find('o')
if o != -1:
list_hw[o] = ' '
hw = ''.join(list_hw)
Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).
All and all it works as intended for what i was doing.
Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.
Upvotes: 0
Reputation: 372
if you only want to use numpy here is a solution
import numpy as np
S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
Upvotes: 1
Reputation: 1
def count_substring(string, sub_string):
for i in range(0,len(string)-2):
if string[i:i+len(sub_string)] == sub_string:
return c
if __name__ == '__main__':
string = input().strip()
sub_string = input().strip()
count = count_substring(string, sub_string)
Upvotes: 0
Reputation: 446
Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case
# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']
# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'
Did a brief skim of other answers so apologies if this is already up there.
Upvotes: 0
Reputation: 9
def find_index(string, let):
enumerated = [place for place, letter in enumerate(string) if letter == let]
return enumerated
for example :
find_index("hey doode find d", "d")
[4, 7, 13, 15]
Upvotes: 0
Reputation: 61
This is solution of a similar question from hackerrank. I hope this could help you.
import re
a = input()
b = input()
if b not in a:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))
(0, 1)
(1, 2)
(4, 5)
Upvotes: 2
Reputation: 1892
src = input() # we will find substring in this string
sub = input() # substring
res = []
pos = src.find(sub)
while pos != -1:
pos = src.find(sub, pos + 1)
Upvotes: 4
Reputation: 115
This function does not look at all positions inside the string, it does not waste compute resources. My try:
def findAll(string,word):
while True:
return all_positions
to use it call it like this:
result=findAll('this word is a big word man how many words are there?','word')
Upvotes: 3
Reputation: 9
By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count
for i in range(0,n):
for j in range(1,n+1):
if f in l:
Upvotes: -1
Reputation: 383
You can easily use:
Upvotes: -3
Reputation: 13672
When looking for a large amount of key words in a document, use flashtext
from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
result = kwp.extract_keywords(txt, span_info=True)
Flashtext runs faster than regex on large list of search words.
Upvotes: 3
Reputation: 3455
This does the trick for me using re.finditer
import re
text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'
# find all occurances of the word 'as' in the above text
find_the_word = re.finditer('as', text)
for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(),
Upvotes: 10
Reputation: 138317
There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:
import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]
If you want to find overlapping matches, lookahead will do that:
[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]
If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:
search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
returns a generator, so you could change the []
in the above to ()
to get a generator instead of a list which will be more efficient if you're only iterating through the results once.
Upvotes: 807
Reputation: 39
The pythonic way would be:
mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]
# s represents the search string
# c represents the character string
find_all(mystring,'o') # will return all positions of 'o'
[4, 7, 20, 26]
Upvotes: 1
Reputation: 7268
You can try :
>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index
Upvotes: 6
Reputation: 65843
You can use re.finditer()
for non-overlapping matches.
>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]
but won't work for:
In [1]: aString="ababa"
In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
Upvotes: 25
Reputation: 61
Whatever the solutions provided by others are completely based on the available method find() or any available methods.
What is the core basic algorithm to find all the occurrences of a substring in a string?
def find_all(string,substring):
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
length = len(substring)
indexes = []
while c < len(string):
if string[c:c+length] == substring:
return indexes
You can also inherit str class to new class and can use this function below.
class newstr(str):
def find_all(string,substring):
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
length = len(substring)
indexes = []
while c < len(string):
if string[c:c+length] == substring:
return indexes
Calling the method
newstr.find_all('Do you find this answer helpful? then upvote this!','this')
Upvotes: 2
Reputation: 249
please look at below code
#!/usr/bin/env python
# coding:utf-8
def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result
if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)
Upvotes: 0
Reputation: 522
This thread is a little old but this worked for me:
numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"
marker = 0
while marker < len(numberString):
marker = numberString.index("five", marker) + 1
except ValueError:
print("String not found")
marker = len(numberString)
Upvotes: 6
Reputation: 15310
Use re.finditer
import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
print (match.start(), match.end())
For word = "this"
and sentence = "this is a sentence this this"
this will yield the output:
(0, 4)
(19, 23)
(24, 28)
Upvotes: 76
Reputation: 12273
Again, old thread, but here's my solution using a generator and plain str.find
def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)
x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]
[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
Upvotes: 71
Reputation: 121
this is an old thread but i got interested and wanted to share my solution.
def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
k += 1 #change to k += len(sub) to not search overlapping results
return result
It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.
Upvotes: 12