Reputation: 33
I have a data-frame that has text in the first column named 'original_column'.
I have successfully been able to pick specific words out of the text column 'original_column' with a list and have them appended to another column and deleted from the original column with the following code:
list1 = {’text’ , ‘and’ , ‘example’}
finder = lambda x: next(iter([y for y in x.split() if y in list1]), None)
df['list1'] = df.original_column.apply(finder)
df['original column']=df['original column'].replace(regex=r'(?i)'+ df['list1'],value="")
I would now like to build on this code by being able to delete ONLY THE FIRST instance of the the specific words in the list from the 'original_column' after appending the listed word to a new column.
The data-frame currently looks like this:
| original column |
__________________________
| text text word |
--------------------------
| and other and |
My current code outputs this:
| original column | list1
______________________________
| word | text
------------------------------
| other | and
My desired to output this:
| original column | list1
_______________________________
| text word | text
-------------------------------
| other and | and
Upvotes: 2
Views: 52
Reputation: 71689
Assuming the given dataframe as:
df = pd.DataFrame({"original_column": ["text text word", "text and text"]})
Use:
import re
pattern = '|'.join(f"\s*{item}\s*" for item in list1)
regex = re.compile(pattern)
def extract_words(s):
s['list1'] = ' '.join(map(str.strip, regex.findall(s['original_column'])))
s['original_column'] = regex.sub(' ', s['original_column']).strip()
return s
df = df.apply(extract_words, axis=1)
print(df)
This results the dataframe df
as:
original_column list1
0 text text word
1 text text and
Upvotes: 0
Reputation: 323226
Let us do replace
df['original column']=df['original column'].replace(regex=r'(?i)'+ df['list1'],value="")
df
Out[101]:
original column list1
0 text text word
1 text text and
Upvotes: 1