Reputation: 4077
what is the procedure to remove a word from a string in one column column that occurs in the other column?
eg:
Sr A B C
1 jack jack and jill and jill
2 run you should run, you should ,
3 fly you shouldnt fly,there you shouldnt ,there
It can be seen that I want column C
, such that it is B minus contents of A. Please note the 3rd example, where fly
is followed by a comma , so it should also take into consideration the punctuations (if the code is more towards detecting a space around it).
Column A
can also have 2 words , so these need to be removed.
I need an expression in Pandas, something like:
df.apply(lambda x: x["C"].replace(r"\b"+x["A"]+r"\b", "").strip(), axis=1)
Upvotes: 5
Views: 3096
Reputation: 71538
Try this:
x['C'] = x['B'].replace(to_replace=r'\b'+x['A']+r'\b', value='',regex=True)
It was based on a previous answer and where someone told me how to do it exactly in pandas. I changed it a little to suit the current situation :)
Upvotes: 3
Reputation: 28946
How does this look?
In [24]: df
Out[24]:
Sr A B
0 1 jack jack and jill
1 2 run you should run,
2 3 fly you shouldnt fly,there
[3 rows x 3 columns]
In [25]: df.apply(lambda row: row.B.strip(row.A), axis=1)
Out[25]:
0 and jill
1 you should run,
2 ou shouldnt fly,there
dtype: object
Upvotes: 5