Temiloluwa
Temiloluwa

Reputation: 23

MainThread: Vaex: Error while Opening Azure Data Lake Parquet file

I tried to open a parquet on an Azure data lake gen 2 storage using SAS URL generated (with the datetime limit and token embedded in the url) using vaex by doing:

vaex.open(sas_url)

and I got the error

ERROR:MainThread:vaex:error opening 'the path which was also the sas_url(can't post it for security reasons)' ValueError: Do not know how to open (can't publicize the sas url) , no handler for https is known

How do I get vaex to read the file or is there another azure storage that works better with vaex?

Upvotes: 0

Views: 465

Answers (2)

Temiloluwa
Temiloluwa

Reputation: 23

I finally found a solution! Vaex can read files in Azure blob storage with this:

import vaex
import adlfs

storage_account = "..."
account_key = "..."
container = "..."
object_path = "..."

fs = adlfs.AzureBlobFileSystem(account_name=storage_account, account_key=account_key)
df = vaex.open(f"abfs://{container}/{object_path}", fs=fs)

for more details, I found the solution in https://github.com/vaexio/vaex/issues/1272

Upvotes: 2

Utkarsh Pal
Utkarsh Pal

Reputation: 4554

Vaex is not capable to read the data using https source, that's the reason you are getting error "no handler for https is known".

Also, as per the document, vaex supports data input from Amazon S3 buckets and Google cloud storage.

Cloud support:

Amazon Web Services S3

Google Cloud Storage

Other cloud storage options

They mentioned that other cloud storages are also supported but there is no supporting document anywhere with any example where they are fetching the data from Azure storage account, that also using SAS URL.

Also please visit API document for vaex library for more info.

Upvotes: -1

Related Questions