Ravi Kiran G
Ravi Kiran G

Reputation: 467

Unzip .gz files from azure data lake using python

I am trying to unzip a .gz file stored in azure data lake.

from azure.datalake.store import core, lib

Tenant_Id = '####'
Client_Key = '####'
Client_Id = '####' 
token = lib.auth(tenant_id=Tenant_Id, client_secret=Client_Key, client_id=Client_Id)

store_name = 'root'
# Connecting to adl
adl = core.AzureDLFileSystem(token, store_name=store_name)
# List of .gz files 
list_of_gz_files = adl.ls('/test/2018')
# Would like to uzip files present inside list_of_gz_files list

Is it possible to unzip them using gzip etc?

Upvotes: 1

Views: 1651

Answers (1)

Jay Gong
Jay Gong

Reputation: 23767

Provide 3 options here to decompress zip files in the ADL.

1.Use Azure Data Factory to unzip the files using the copy file activity (native support for gzip files).

enter image description here

2.Use Custom Activity in ADF. Create job in azure batch and access data lake to unzip the file with python code.(Use gzip package)

3.Use custom extractor in U-SQL,please refer to this trace:How to preprocess and decompress .gz file on Azure Data Lake store?

Upvotes: 1

Related Questions