Ashwin Geet D'Sa
Ashwin Geet D'Sa

Reputation: 7369

Getting list of unique substrings of a given string

The task is to obtain a unique list of substrings in python.

I am currently using the breakup of the problem into 2 parts: obtain a list of all the substrings, followed by obtaining unique substrings.

I am using the below code:

substrings=[]
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        substrings.append(substr)
uniq=[]
for ss in substrings:
    if ss not in uniq:
        uniq.append(ss)

Is there a faster way to solve this problem or a so-called python way of doing it in a more flexible way?

a simple example string being: "aabaa", possible substrings are [a,a,b,a,a,aa,ab,ba,aa,aab,aba,baa,aaba,abaa,aabaa], unique substring which is desired at the end [a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa]

Upvotes: 1

Views: 2486

Answers (2)

Edwin Wallis
Edwin Wallis

Reputation: 71

Use a set instead of a list for the second part. Finding something in a list costs O(n) while in a set it costs O(1) and you don't have to check if its new. Sets won't add something if it is already in the list.

uniq=set()
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        uniq.add(substr)

Upvotes: 0

Pixel
Pixel

Reputation: 101

Use Itertools and Set. Similar to the answer of Edwin but with Itertools, and in one line.

import itertools

uniq=list(set([inputstring[x:y] for x, y in itertools.combinations(
            range(len(inputstring) + 1), r = 2)]))

Basically you use itertools to first find all combinations, then set to find unique elements, then cast to list.

Code for combinations taken from https://www.geeksforgeeks.org/python-get-all-substrings-of-given-string/

Edit for a clearer explanation: First, use combinations to get all pair of indexes corresponding to substrings. The trick here is that itertools.combinations starts with all (0,X) pairs, and then (1,X) pairs, etc. Since we are using combinations and not permutations, we eliminate thus reverse substrings such as (1,0) since they will have been seen in the (0,X) enumerations.

Then simply use these with a list comprehension to get all substrings, use a set to find unique elements, and cast to a list.

Hope that helps

Upvotes: 1

Related Questions