Reputation: 1256
I have a text based string, and want to retain only specific words.
sample = "This is a test text. Test text should pass the test"
approved_list = ["test", "text"]
Expected output:
"test text Test text test"
I have read through a lot of regex
based answers, unfortunately they do not address this specific issue.
Can the solution also be extended to a pandas series?
Upvotes: 1
Views: 31
Reputation: 294218
You don't need pandas
for this. Use the regex module re
import re
re.findall('|'.join(approved_list), sample, re.IGNORECASE)
['test', 'text', 'Test', 'text', 'test']
If you had a pd.Series
sample = pd.Series(["This is a test text. Test text should pass the test"] * 5)
approved_list = ["test", "text"]
Use the str
string accessor
sample.str.findall('|'.join(approved_list), re.IGNORECASE)
0 [test, text, Test, text, test]
1 [test, text, Test, text, test]
2 [test, text, Test, text, test]
3 [test, text, Test, text, test]
4 [test, text, Test, text, test]
dtype: object
Upvotes: 2