Lekr0
Lekr0

Reputation: 733

Total number of times a substring can be found in a string

I am having a hard time understanding the code below.. Especially the last line.

string = "abcdcdc"

sub_string = "cdc"

print(sum([1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)]))

Output of the code is '2'.

This code write the times of number the substring found in the string given above.

Any explanation would be appreciated .

Upvotes: 0

Views: 535

Answers (5)

tevemadar
tevemadar

Reputation: 13195

It is a so called list comprehension, which has this syntax, that is why it may look like an if without visible outcome placed in a for loop.

The steps are not complicated otherwise:

  • we need all 3-character subsequences from the big string, the range(0, len(string) - len(sub_string) + 1) part prepares the starting indices for those, 7-3+1=5, so the range will run from 0...4. You can check it in the interactive shell, it will say range(0,5)
  • then the [] thing creates a list, you can check that too: [1 for i in range(0,5)] will create [1,1,1,1,1]
  • but you want to compare things, with the if part. First you could check the substrings, instead of 1, write the substring thing: [string[i:(len(sub_string) + i)] for i in range(0,5)], as Karl Graham suggests and resulting in ['abc', 'bcd', 'cdc', 'dcd', 'cdc'], then you could use the full comparison, [string[i:(len(sub_string) + i)]==sub_string for i in range(0,5)], which produces [False, False, True, False, True]
  • in fact you can sum this already, sum([string[i:(len(sub_string) + i)]==sub_string for i in range(0,5)]) will output 2, because True is counted as 1 and False as 0. Just here whoever created this code, decided to output actual numbers and used the optional if clause for the list comprehension, creating a list of 1-s, but only for the two matching positions: [1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)] displays [1,1]
  • and sum adds those 1-s together, resulting in 2.

Summary code for running as Python snippet or in a notebook (like here):

string = "abcdcdc"
sub_string = "cdc"
print(len(string))
print(len(sub_string))
print(range(0,len(string)-len(sub_string)+1))
print([1 for i in range(0,5)])
print([string[i:(len(sub_string) + i)] for i in range(0,5)])
print([string[i:(len(sub_string) + i)]==sub_string for i in range(0,5)])
print(sum([string[i:(len(sub_string) + i)]==sub_string for i in range(0,5)]))
print([1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)])
print(sum([1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)]))

Upvotes: 0

Filipp Voronov
Filipp Voronov

Reputation: 4197

[1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)]

means loop i in range from 0 upto len(string) - len(sub_string) + 1 (not including) and if a substring of string at index i and with length of sub_string (i.e. upto index (len(sub_string) + i)) is equal to sub_string then take 1 and collect them as list, i.e. the result is [1, 1] because substring is a substring of string two times.

See Python List Comprehension for more details.


sum([1 for i in range(0, len(string) - len(sub_string) + 1) if (string[i:(len(sub_string) + i)] == sub_string)])

It just sums the list described above, sum([1, 1]) equals to 2.

Upvotes: 1

Arkadiusz Tymieniecki
Arkadiusz Tymieniecki

Reputation: 116

its python so you have to read it backwards, 'if string contains substring, try to find number of substring occurrences.' I'd write it this way:

 'abcdcdc'.count('cdc')

Upvotes: 0

Glostas
Glostas

Reputation: 1180

The code loops over the string indices from the start of "string" to the end minus the number of elements in subsring.

The [] crates a list and 1 is wrtitten in there if the next 3 elements of the string are the same as sub_string.

sum() returns the sum of the list. Since you included a 1 everytime you found sub_string, this counts the number of occurances of sub_string in string

Upvotes: 0

Karl Graham
Karl Graham

Reputation: 151

If you view the list of values generated by the for loop you will find it creates the list below:

print([string[i:(len(sub_string) + i)] for i in range(0, len(string) - len(sub_string) + 1)])
['abc', 'bcd', 'cdc', 'dcd', 'cdc']

The list contains the substring you are searching for twice which is the result you obtain.

Upvotes: 0

Related Questions