Reputation: 715
I'm struggling to remove the first part of my URLs in column myId in csv file.
my.csv
myID
https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:b1234567-9ee6-11b7-b4a2-7b8c2344daa8d
desired output for myID
b1234567-9ee6-11b7-b4a2-7b8c2344daa8d
my code:
df['myID'] = df['myID'].map(lambda x: x.lstrip('https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:'))
output in myID (first letter 'b' is missing in front of the string):
1234567-9ee6-11b7-b4a2-7b8c2344daa8d
the above code removes https://mybrand.com/trigger:open?Myservice=Email&recipient=brn:zib:
However it also removes the first letter from myID if there is one in front of the ID, if it's a number then it remains unchanged.
Could someone help with this? thanks!
Upvotes: 3
Views: 452
Reputation: 33
You can use re (if the part before what you want to extract is always the same)
import re
idx = re.search(r':zib:', myID)
myNewID = myID[idx.end():]
Then you will have :
myNewID
'b1234567-9ee6-11b7-b4a2-7b8c2344daa8d'
Upvotes: 0
Reputation: 1
With lstrip you remove all characters from a string that match the set of characters you pass as an argument. So:
string = abcd
test = string.lstrip(ad)
print(test)
If you want to strip the first x characters of the string, you can just slice it like an array. For you, that would be something like:
df['myID'] = df['myID'].map(lambda x: x[:-37])
However, for this to work, the part you want to get from the string should have a constant size.
Upvotes: 0
Reputation: 521314
You could try a regex replacement here:
df['myID'] = df['myID'].str.replace('^.*:', '', regex=True)
This approach is to simply remove all content from the start of MyID
up to, and including, the final colon. This would leave behind the UUID you want to keep.
Upvotes: 1