Reputation: 113
How do I find the words in the list and remove any other words after the word found?
For example:
remove_words = ['stack', 'over', 'flow']
Input:
0 abc test test stack yxz
1 cde test12 over ste
2 def123 flow test123
3 yup over 4562
Would like to find the words from a list remove_words list in the pandas dataframe column and remove those words and any words after.
Results:
0 abc test test
1 cde test12
2 def123
3 yup
Upvotes: 2
Views: 229
Reputation: 2674
remove_words = ['stack', 'over', 'flow']
inputline = "abc test test stack yxz"
for word in inputline.split(" "):
if word in remove_words:
print(inputline[:test.index(word)])
This will split the string input into a list then finds the index of any words in the remove_words list and slice the rest of the list off. Just need to do a loop to replace the hardcore string for your whole dataset.
Upvotes: 0
Reputation: 862641
Use split
by all joined values by |
for regex OR
and select first list
s by str[0]
:
remove_words = ['stack', 'over', 'flow']
#for more general solution with word boundary
pat = r'\b{}\b'.format('|'.join(remove_words))
df['col'] = df['col'].str.split(pat, n=1).str[0]
print (df)
col
0 abc test test
1 cde test12
2 def123
3 yup
Upvotes: 2
Reputation: 533
I have not written in pandas dataframe, but the concert should be the same in any language just loop through all the words and use a replace method with an empty string.
Upvotes: 0
Reputation: 433
The first step would be to check if the input has a value in it, if not, you can just return the entire input
if "stack" or "over" or "flow" not in input:
return input
Now for the removing part. I think the best way to do this is to loop through each value in the input array(I am assuming it is an array) and call str_replace
Upvotes: 0