Reputation: 3161
I have a condition where spurious data is created and I am trying to clean it.
eg...
[email protected]/!ut/5 #RealLink
[email protected]/ut1/5_RTFDEERERTGFEFD # System adds junks to it
[email protected]/ut1/5_dvkerfddfrejermsdkasmf # System adds junks to it
I am trying to clean this up by dropping everything after !ut
So far I have tried :
SPA_MX = Mexico['Page URL'].str.startswith("http://[email protected]/ut1")
but this returns a boolean.
I would like advise on the most efficient way to achieve this.
Upvotes: 1
Views: 38
Reputation: 394071
You can do this using apply
on the column and then use find
to return the index of the pattern and slice the str if found:
In[69]:
df['url'].apply(lambda x: x[:x.find('!ut') + 3] if x.find('!ut') != -1 else x)
Out[69]:
0 [email protected]/!ut
1 [email protected]/ut1/5_RTFDEERERTGFEFD
2 [email protected]/ut1/5_dvkerfddfrejermsdkasmf
Name: url, dtype: object
Upvotes: 1
Reputation: 583
my_string="[email protected]/!ut/5"
final = my_string.split("!ut")[0]
output:
Upvotes: 1