Reputation: 1099
Have anyone achieved this functionality before ? It's equivalent to ls -ltr *xyz*
in unix and I would like to achieve the same in my cloud dataflow code.
Any lead would be appreciated.
Thank you.
Upvotes: 1
Views: 5476
Reputation: 49473
It is possible to do this filtering on the client side. Here is an example using the google-cloud
java client library to access the Google Cloud Storage APIs.
The example below lists all files in the root directory of the bucket which matches the given regular expression pattern.
I've used regular expressions instead of the glob pattern that shell commands like ls
support since regular expressions are more flexible.
I would recommend you go through the java library documentation for google-cloud
.
import com.google.api.gax.paging.Page;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.Storage.BlobListOption;
import com.google.cloud.storage.StorageOptions;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
/**
* An example which lists the files in the specified GCS bucket matching the
* specified regular expression pattern.
*
* <p>Run it as PROGRAM_NAME <BUCKET_NAME> <REGEX_MATCH_PATTERN>
*/
public class ListBlobsSample {
public static void main(String[] args) throws IOException {
// Instantiates a Storage client
Storage storage = StorageOptions.getDefaultInstance().getService();
// The name of the GCS bucket
String bucketName = args[0];
// The regular expression for matching blobs in the GCS bucket.
// Example: '.*abc.*'
String matchExpr = args[1];
List<String> results = listBlobs(storage, bucketName, Pattern.compile(matchExpr));
System.out.println("Results: " + results.size() + " items.");
for (String result : results) {
System.out.println("Blob: " + result);
}
}
// Lists all blobs in the bucket matching the expression.
// Specify a regex here. Example: '.*abc.*'
private static List<String> listBlobs(Storage storage, String bucketName, Pattern matchPattern)
throws IOException {
List<String> results = new ArrayList<>();
// Only list blobs in the current directory
// (otherwise you also get results from the sub-directories).
BlobListOption listOptions = BlobListOption.currentDirectory();
Page<Blob> blobs = storage.list(bucketName, listOptions);
for (Blob blob : blobs.iterateAll()) {
if (!blob.isDirectory() && matchPattern.matcher(blob.getName()).matches()) {
results.add(blob.getName());
}
}
return results;
}
}
If you instead need to match just prefixes in the object names, Objects: list API supports it.
You need to specify the prefix
query parameter in the request when doing GET https://www.googleapis.com/storage/v1/b/bucket/o
. This is also supported using the java client library (you will have to specify it while building the BlobListOption
you pass to storage.list()
).
prefix
string
Filter results to objects whose names begin with this prefix.
gsutil
supports such queries and it does the filtering solely on the client side (for some cases it issues multiple requests too).
Upvotes: 2
Reputation: 1003
The following may not be exactly helpful for your use case, but if you are looking to narrow down the results by a certain prefix and then apply regex to match your final regex.
Storage storage = StorageOptions.getDefaultInstance().getService();
Bucket bucket = storage.get(bucketName)
BlobListOption blobListOption = Storage.BlobListOption.prefix(prefixPattern)
for (Blob blob : bucket.list(blobListOption).iterateAll()) {
System.out.println(blob);
}
Upvotes: 0
Reputation: 12145
GCS supports prefix queries, you can efficiently list xyz*; but to list xyz you would have to list the entire bucket and filter at the client.
Upvotes: 0