Reputation: 7644
I have a list of strings
L = ["Your acccount has beed deleted by the administrator", "Trouble while deleting account",
"Please delete my account"]
I want to find if any form of the word delete
is present in the string.
other forms/tense of delete could be deleted,deleting,deletion,delete
similarly for another word say face
, other forms/tense of face could be facing
.
Is there any way to identify such scenarios using regex??
AS a sample word:
I am looking to write such a pattern so that if i give delete
in pattern and do regex.search
re.search(r'\b(delete)\b',"I am deleted you")
It should give me a match of the word 'deleted'
as well.
for eg:
for i in L:
if re.search(r'\b(delete)\b',i) != None:
print(i)
"Your acccount has beed deleted by the administrator",
"Trouble while deleting account",
"Please delete my account"
Upvotes: 1
Views: 771
Reputation: 142208
English is a terrible language to do this for.
/\b[dD]elet(e|es|ed|ion|ing)\b/
^^ ^^ zero-width word boundary
^^^^ "d" or "D"
^ ^ ^ ^ ^ ^ any of this list
You do need to worry about initial caps, and perhaps all-caps. The "delete" example works for most verbs ending with "e".
/\b(see|saw|seen|sees)\b/
There are plenty of irregular verbs.
/\bleap(|s|t|ed|ing)\b/
Usage is turning "leapt" into "leaped". Or both are acceptable (depending on who you listen to). Or should that be "to whom you listen".
And some words differ between British and American English.
And if you need to grab the word found, then add parens at the appropriate place. (Or do like I did on \b(see|...)\b
Upvotes: 1
Reputation: 626691
A regex is not actually the right tool you are looking for, and BoarGules illustrated it with go/went
well.
If you want to solve the issue, you need an NLP tool like Spacy.
Here is an example:
import spacy
nlp = spacy.load("en_core_web_trf")
text = "Your acccount has beed deleted by the administrator"
doc = nlp(text)
[t.text for t in doc if t.lemma_ == 'delete']
## => ['deleted']
If you have a list of lemmas, replace if t.lemma_ == 'delete'
with if t.lemma_ in your_list
.
Upvotes: 3