Reputation: 581
I get the desired output with the following code:
row='s3://bucket-name/qwe/2022/02/24/qwe.csv'
new_row = row.split('s3://bucket-name/')[1]
print(new_row)
qwe/2022/02/24/qwe.csv
I want to achieve this while having the bucket name saved in a variable, like this:
bucket_name="bucket-name"
new_row = row.split('s3://'+bucket_name+'/')[1]
This doesn't work (says invalid syntax).
Is there another way I can define this or will I have to use a different function to split?
Upvotes: 0
Views: 60
Reputation: 77337
I don't see any advantage to split
when you could just slice the url to get the part you want.
>>> row='s3://bucket-name/qwe/2022/02/24/qwe.csv'
>>> bucket_name = "bucket-name"
>>> row[len("s3://" + bucket_name + "/"):]
'qwe/2022/02/24/qwe.csv'
But since this is a URL, you will have more robust solution if you parse the url. You can use the parts to verify that you got the string you want and it will deal with other issues such appended query strings.
from urllib.parse import urlsplit
row='s3://bucket-name/qwe/2022/02/24/qwe.csv'
parts = urlsplit(row)
if parts.scheme != "s3":
raise ValueError("not s3 bucket")
if parts.netloc != "bucket-name":
raise ValueError("not my bucket")
print(parts.path[1:])
Upvotes: 1
Reputation: 4062
Oops you have missed quotes
bucket_name='bucket-name'
new_row = row.split('s3://'+bucket_name+'/')[1]
ouytput
'qwe/2022/02/24/qwe.csv'
Upvotes: 1
Reputation: 191
You can also do like this:
row='s3://bucket-name/qwe/2022/02/24/qwe.csv'
bucket_name='bucket-name'
new_row = row.split(f"""s3://{bucket_name}/""")[1]
Upvotes: 1