Shubham R
Shubham R

Reputation: 7644

Pattern matching for different tense of word regex

I have a list of strings

L  = ["Your acccount has beed deleted by the administrator", "Trouble while deleting account",
     "Please delete my account"]

I want to find if any form of the word delete is present in the string. other forms/tense of delete could be deleted,deleting,deletion,delete

similarly for another word say face , other forms/tense of face could be facing. Is there any way to identify such scenarios using regex??

AS a sample word:

I am looking to write such a pattern so that if i give delete in pattern and do regex.search

re.search(r'\b(delete)\b',"I am deleted you")

It should give me a match of the word 'deleted' as well.

for eg:

for i in L:
    if re.search(r'\b(delete)\b',i) != None:
        print(i)

"Your acccount has beed deleted by the administrator", 
"Trouble while deleting account",
"Please delete my account"
    

Upvotes: 1

Views: 771

Answers (2)

Rick James
Rick James

Reputation: 142208

English is a terrible language to do this for.

/\b[dD]elet(e|es|ed|ion|ing)\b/
 ^^                         ^^   zero-width word boundary
   ^^^^                          "d" or "D"
           ^ ^  ^  ^   ^   ^     any of this list 

You do need to worry about initial caps, and perhaps all-caps. The "delete" example works for most verbs ending with "e".

/\b(see|saw|seen|sees)\b/

There are plenty of irregular verbs.

/\bleap(|s|t|ed|ing)\b/

Usage is turning "leapt" into "leaped". Or both are acceptable (depending on who you listen to). Or should that be "to whom you listen".

And some words differ between British and American English.

And if you need to grab the word found, then add parens at the appropriate place. (Or do like I did on \b(see|...)\b

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

A regex is not actually the right tool you are looking for, and BoarGules illustrated it with go/went well.

If you want to solve the issue, you need an NLP tool like Spacy.

Here is an example:

import spacy
nlp = spacy.load("en_core_web_trf")
text = "Your acccount has beed deleted by the administrator"
doc = nlp(text)
[t.text for t in doc if t.lemma_ == 'delete']
## => ['deleted']

If you have a list of lemmas, replace if t.lemma_ == 'delete' with if t.lemma_ in your_list.

Upvotes: 3

Related Questions