Dave
Dave

Reputation: 19250

How do I set an expiration date on a file I create in the Azure Data lake using the Python SDK?

I'm using Python 3.8 and Azure data lake Gen 2. I want to set an expiration time for a file I save on the data lake. Following this -- azure.datalake.store.core.AzureDLFileSystem class | Microsoft Docs, I tried the below

            file_client = directory_client.create_file(filename)
            file_client.upload_data(
                data,
                overwrite=True
            )
            ts = time.time() + 100
            file_client.set_expiry(path=path, expire_time=ts)

but am getting the error

AttributeError: 'DataLakeFileClient' object has no attribute 'set_expiry'

What's the proper way to set an expiration time when creating a file on the data lake?

Upvotes: 1

Views: 1752

Answers (1)

Rahul Iyer
Rahul Iyer

Reputation: 21025

The reason for your error, is that you appear to be attempting to call a method belonging to azure.datalake.store.core.AzureDLFileSystem on an object of type DataLakeFileClient. This is why you get the error! The method does not exist for objects of type DataLakeFileClient.

If you wish to call the method for set_expiry, you must first create the correct kind of object.

For example in Gen1, create the object first as described here:

https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-python

## Declare variables
subscriptionId = 'FILL-IN-HERE'
adlsAccountName = 'FILL-IN-HERE'

## Create a filesystem client object
adlsFileSystemClient = core.AzureDLFileSystem(adlCreds, store_name=adlsAccountName)

Using this object, you can call

adlsFileSystemClient exactly like how you have in your code example.

set_expiry(path, expiry_option, expire_time=None)

Just make sure you're trying to call methods on the correct type of object.

For Gen 2:

from azure.storage.filedatalake import DataLakeServiceClient
datalake_service_client = DataLakeServiceClient.from_connection_string(self.connection_string)

# Instantiate a FileSystemClient
file_system_client = datalake_service_client.get_file_system_client("mynewfilesystem")

For Gen2, you need to set a blob to expire as follows: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal#expire-data-based-on-age

Expire data based on age

Some data is expected to expire days or months after creation. You can configure a lifecycle management policy to expire data by deletion based on data age. The following example shows a policy that deletes all block blobs older than 365 days.

{
  "rules": [
    {
      "name": "expirationRule",
      "enabled": true,
      "type": "Lifecycle",
      "definition": {
        "filters": {
          "blobTypes": [ "blockBlob" ]
        },
        "actions": {
          "baseBlob": {
            "delete": { "daysAfterModificationGreaterThan": 365 }
          }
        }
      }
    }
  ]
}

Upvotes: 2

Related Questions