Reputation: 903
I'm trying to build an intake catalog for my team. The datasets are on a shared MinIO server for which each user should have their own service account, and therefore a key/secret pair.
When creating the first catalog entry like this:
source = intake.open_netcdf(
"s3://bucket/path/to/file.netcdf",
storage_options = storage_options
)
where storage_options
is a dictionary (read from a json file that the user should have in their file system) containing:
{
'key': 'KEY',
'secret': 'SECRET',
'client_kwargs': {'endpoint_url': 'http://X.X.X.X:9000'}
}
i.e. the necessary credentials for s3fs
to access the MinIO server; I get a catalog entry containing the secrets:
sources:
my_dataset:
args:
storage_options:
client_kwargs:
endpoint_url: http://X.X.X.X:9000
key: KEY
secret: SECRET
urlpath: s3://bucket/path/to/file.netcdf
description: 'my description'
driver: intake_xarray.netcdf.NetCDFSource
Now this catalog file shouldn't be shared because it contains secrets, defeating the purpose of having a catalog. My question then is: how do I make the storage_options
part be read from the secrets file that the user will have? (ideally without having to change from json to yaml, but it's not a requirement)
Upvotes: 0
Views: 150
Reputation: 28684
Fortunately, AWS already provides for doing this, either via environment variables or files placed in special locations ( https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables and below).
Intake also has ways of templating values, but these ultimately end up in using the environment or getting values directly from the user. Additionally, your case is complicated by needing these values not in a top-level parameter, but nested inside storage_options. We could probably improve this system, but it would still beg the question, where should the secret values come from?
Upvotes: 0