Thomas Segato
Thomas Segato

Reputation: 5257

ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage even after adlsf is installed

I have an azure function with code below:

storage_account_url = f"{self.datalake_settings.STORAGE_ENDPOINT}/{parquet_folder_path}/{file_name}.parquet"
storage_options = {
    "account_name": self.datalake_settings.STORAGE_ACCOUNT,
    "client_id": self.datalake_settings.RUNACCOUNT_ID,
    "client_secret": self.datalake_settings.RUNACCOUNT_KEY.get_secret_value(),
    "tenant_id": self.settings.TENANT_ID
}

df.to_parquet( storage_account_url, engine='pyarrow', compression='snappy', storage_options=storage_options )

This is my requirements.txt:

azure-functions
azure-identity
azure-storage-blob
azure-monitor-opentelemetry
opentelemetry-api
opentelemetry-sdk
opentelemetry-semantic-conventions
pydantic
adlfs
azure-storage-blob
azure-storage-file-datalake

This is my .venv/lib: enter image description here

When I run this code I get following error:

System.Private.CoreLib: Exception while executing function: Functions.get_exchangerates_trigger. System.Private.CoreLib: Result: Failure Exception: ImportError: Install adlfs to access Azure Datalake Gen2 and Azure Blob Storage

Any ideas how to troubleshoot this? It clearly looks like the adlfs and blobstorage packages are installed.

Upvotes: 0

Views: 32

Answers (1)

Thomas Segato
Thomas Segato

Reputation: 5257

I found another approach that works:

    credential = ClientSecretCredential(
        tenant_id=self.settings.TENANT_ID,
        client_id=self.datalake_settings.RUNACCOUNT_ID,
        client_secret=self.datalake_settings.RUNACCOUNT_KEY.get_secret_value()
    )
    
    # Create blob service client
    account_url = f"https://{self.datalake_settings.STORAGE_ACCOUNT}.blob.core.windows.net"
    blob_service_client = BlobServiceClient(
        account_url=account_url,
        credential=credential
    )
    
    # Get container name from the EXTRACT_ROOT (assuming it's in format "container/path")
    container_name = "st-xx-lake-xxx-dev-ctn"
    
    # Get the blob path (everything after container name)
    blob_path = f"{parquet_folder_path}/{file_name}"
    
    # Get container client
    container_client = blob_service_client.get_container_client(container_name)
    
    # Write parquet to bytes buffer
    parquet_buffer = io.BytesIO()
    df.to_parquet(parquet_buffer, engine='pyarrow', compression='snappy')
    parquet_buffer.seek(0)
    
    # Upload the parquet file
    blob_client = container_client.upload_blob(
        name=blob_path,
        data=parquet_buffer,
        overwrite=True
    )

Upvotes: 0

Related Questions