dlt forces to install pyarrow for duckdb

Question

It's very basic pipeline to load some data into duckdb with dlthub.

Error

dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1737502505.134826 with exception:



You must install additional dependencies to run dlt pyarrow helpers. If you use pip you may do the following:

pip install "dlt[parquet]"

Install pyarrow to be allow to load arrow tables, panda frames and to use parquet files.

I'm not reading data from any local files, so I don't see a point of installing pyarrow. How to fix that error without installing dlt[parquet] (I suppose it will install pyarrow)?

The code The requirements file:

duckdb
dlt[duckdb]>=1.5.0
yfinance

Source

@dlt.source(name="yahoo")
def source_yahoo(ticker):
    @dlt.resource(primary_key="id", write_disposition="merge")
    def prices_and_dividends():
        yield yf.Ticker(ticker).history(period="1y")

    yield prices_and_dividends()

Pipeline

def load_prices(source) -> None:
    pipeline = dlt.pipeline(
        pipeline_name="load_prices",
        destination='duckdb',
        dataset_name="test_pipeline",
    )

    load_info = pipeline.run(source)
    print(load_info)  # noqa: T201


load_prices(source_yahoo('BRY'))

secrets.toml

destination.duckdb.credentials="duckdb:///../data/matstock.db"

dlt forces to install pyarrow for duckdb

Answers (1)

Related Questions