Need String Comparison 's solution for Partial String Comparison in Python

Question

Scenario: I have some tasks performed for respective "Section Header"(Stored as String), result of that task has to be saved against same respective "Existing Section Header"(Stored as String)

While mapping if respective task's "Section Header" is one of the "Existing Section Header" task results are added to it. And if not, new Section Header will get appended to the Existing Section Header List.

Existing Section Header Looks Like This:

[ "Activity (Last 3 Days)", "Activity (Last 7 days)", "Executable running from disk", "Actions from File"]

For below set of String the expected behaviour is as follows:

"Activity (Last 30 Days) - New Section Should be Added

"Executables running from disk" - Same existing "Executable running from disk" should be referred [considering extra "s" in Executables same as "Executable".

"Actions from a file" - Same existing "Actions from file" should be referred [Considering extra article "a"]

Is there any built-in function available python that may help incorporate same logic. Or any suggestion regarding Algorithm for this is highly appreciated.

Irshad Bhat · Accepted Answer

Since you want to compare only stem or "root word" of a given word, I suggest using some stemming algorithm. Stemming algorithms attempt to automatically remove suffixes (and in some cases prefixes) in order to find the "root word" or stem of a given word. This is useful in various natural language processing scenarios, such as search. Luckily there is a python package for stemming. You can download it from here.

Next you want to compare string without stop-words (a,an,the,from, etc.). So you need to filter these words before comparing strings. You can get a list of stop-words from internet or you can use nltk package to import stop-words list. You can get nltk from here

If there is any issue with nltk, here is the list of stop words:

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself',
 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which',
 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be',
 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an',
 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for',
 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after',
 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under',
 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all',
 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not',
 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don',
 'should', 'now']

Now use this simple code to get your desired output:

from stemming.porter2 import stem
from nltk.corpus import stopwords
stopwords_ =  stopwords.words('english')
def addString(x):
   flag = True
   y = [stem(j).lower() for j in x.split() if j.lower() not in stopwords_]
   for i in section:
      i = [stem(j).lower() for j in i.split() if j.lower() not in stopwords_]
      if y==i:
         flag = False
         break
   if flag:
      section.append(x)
      print "	New Section Added"

Demo:

>>> from stemming.porter2 import stem
>>> from nltk.corpus import stopwords
>>> stopwords_ =  stopwords.words('english')
>>> 
>>> def addString(x):
...    flag = True
...    y = [stem(j).lower() for j in x.split() if j.lower() not in stopwords_]
...    for i in section:
...       i = [stem(j).lower() for j in i.split() if j.lower() not in stopwords_]
...       if y==i:
...          flag = False
...          break
...    if flag:
...       section.append(x)
...       print "	New Section Added"
... 
>>> section = [ "Activity (Last 3 Days)", "Activity (Last 7 days)", "Executable running from disk", "Actions from File"]  # initial Section list
>>> addString("Activity (Last 30 Days)")
    New Section Added
>>> addString("Executables running from disk")
>>> addString("Actions from a file")
>>> section
['Activity (Last 3 Days)', 'Activity (Last 7 days)', 'Executable running from disk', 'Actions from File', 'Activity (Last 30 Days)']  # Final section list

Need String Comparison 's solution for Partial String Comparison in Python

Answers (2)

Related Questions

Need String Comparison &#39;s solution for Partial String Comparison in Python

Answers (2)

Related Questions

Need String Comparison 's solution for Partial String Comparison in Python