OPK
OPK

Reputation: 4180

How to elegantly parse a string to have exactly what you need?

I currently have a S3 bucket directory key like this:

String dir = "s3://mybucket/workflow/science/sweet-humoor/vars";

What I am trying to do is to get the prefix of this S3 directory, a prefix is actually without s3:://mybucket/, so what I want to have is workflow/science/sweet-humoor/vars

Now, what would be a elegant way to achieve this? I know the quickest way to do is to do a subString(13), but this will break whenever the bucket name changes.

How would you handle this?

Upvotes: 0

Views: 141

Answers (6)

MC Emperor
MC Emperor

Reputation: 22997

The URIBuilder class from the org.apache.http.client.utils package can do that.

URIBuilder builder = new URIBuilder(dir);
String thePath = builder.getPath();

This automatically extracts /workflow/science/sweet-humoor/vars from the path. The retrieved path does not include mybucket, because URIBuilder sees the first part immediately after the protocol specifier (s3://) as hostname.

Further processing can be done through Path p = Paths.get(thePath).

Upvotes: 0

sinclair
sinclair

Reputation: 2861

String dir = "s3://mybucket/workflow/science/sweet-humoor/vars";
dir = dir.replace("//", "").substring( dir.indexOf("/") );
System.err.println(dir);  // prints mybucket/workflow/science/sweet-humoor/vars

Upvotes: 0

Behnam Safari
Behnam Safari

Reputation: 3081

You can try this:

String dir2=dir.replaceAll("s3://"+dir.split("/")[2]+"/","");

Upvotes: 0

daniu
daniu

Reputation: 15008

It's cleanest to use the Java library functions for paths instead of handling the Strings directly. What you have is an URL, so

URL url = new URL(dir);
URI uri = url.toURI();
Path fullpath = Paths.get(uri);

Now you have a Path (ie the "/mybucket/workflow/science/sweet-humoor/vars" part), and you can get the subpath by

// start index 1 to skip the first directory element
Path subpath = fullpath.subpath(1, fullpath.getNameCount()-1);

You can make a File out of this (subpath.toFile()), or just get the path string by

subpath.toString();

Upvotes: 1

Ajith
Ajith

Reputation: 59

I would split the string by "/" and get the values from third index and join it with "/". Sample code in python.

input_string = "s3://mybucket/workflow/science/sweet-humoor/vars"

list1 = (input_string.split("/"))
print(list1)
print("/".join(list1[3:]))

Output: workflow/science/sweet-humoor/vars

Upvotes: -1

Sweeper
Sweeper

Reputation: 271775

Use a regular expression with replaceAll:

String result = directoryKey.replaceAll("s3://[^/]+/", "");

The regex here is:

s3://[^/]+/

It matches the part that you want to remove, which is s3:// followed by a bunch of non-slash characters, followed by a slash.

Upvotes: 1

Related Questions