bzm3r
bzm3r

Reputation: 4605

DuckDB: how do I use the result of a query on `DESCRIBE` to `SELECT` from a table?

Suppose I want to select all columns of a certain type from a DuckDB table. For example, selecting all VARCHAR type columns, after creating a table like:

CREATE TABLE dummy (x VARCHAR, y BIGINT, z VARCHAR);
INSERT INTO dummy
VALUES ('a', 0, 'a'),
  ('b', 1, 'b'),
  ('c', 2, 'c');

Inngeneral, I might have an arbitrary number of VARCHAR type columns, so this query should be "dynamic". I get a list of the relevant columns using DESCRIBE:

SELECT column_name
FROM (DESCRIBE dummy)
WHERE column_type = 'VARCHAR';

This statement gives me a list of the column names which have type VARCHAR. But how do I use this? I tried using the COLUMNS expression:

SELECT COLUMNS(
    c->c IN (
      SELECT column_name
      FROM (DESCRIBE dummy)
      WHERE column_type = 'VARCHAR'
    )
  )
FROM dummy

But this gives me the error: BinderException: Binder Error: Table function cannot contain subqueries. I don't really understand the error. I get the same error when trying:

SELECT COLUMNS(
    c->list_contains(
      (
        SELECT column_name
        FROM (DESCRIBE dummy)
        WHERE column_type = 'VARCHAR'
      ),
      c
    )
  )
FROM dummy

How do I connect the dots between getting a list of columns by querying DESCRIBE tbl, and then using that list to select from tbl?

Upvotes: 4

Views: 1397

Answers (3)

jqurious
jqurious

Reputation: 21580

SQL-level variables were added in DuckDB 1.1.0

e.g. SET VARIABLE / GETVARIABLE

duckdb.sql("""
SET VARIABLE VARCHAR_NAMES = (
   SELECT LIST(column_name)
   FROM (DESCRIBE dummy)
   WHERE column_type = 'VARCHAR'
)
""")

duckdb.sql("""
FROM DUMMY SELECT COLUMNS(x -> x in GETVARIABLE('VARCHAR_NAMES'))
""")
┌─────────┬─────────┐
│    x    │    z    │
│ varchar │ varchar │
├─────────┼─────────┤
│ a       │ a       │
│ b       │ b       │
│ c       │ c       │
└─────────┴─────────┘

Upvotes: 2

Amira Bedhiafi
Amira Bedhiafi

Reputation: 1

Try to use the duckdb_columns() function since it provides metadata about the columns available in the DuckDB instance :

SELECT column_name
    FROM duckdb_columns()
    WHERE table_name = 'dummy' AND data_type = 'VARCHAR'

I am using Python to concatenate the column names into a single string, then used that string in the SELECT :

import duckdb

con = duckdb.connect('your_database.db')

column_names = con.execute("""
    SELECT column_name
    FROM duckdb_columns()
    WHERE table_name = 'dummy' AND data_type = 'VARCHAR';
""").fetchall()

column_names = [col[0] for col in column_names]

columns_string = ', '.join(column_names)
query = f"SELECT {columns_string} FROM dummy;"

result = con.execute(query).fetchdf()

print(result)

Check @jqurious it may suits you well using SQL level variables.

Upvotes: 1

peak
peak

Reputation: 116957

As you noticed, the DuckDB binder error message says:

Table function cannot contain subqueries

Effectively, this means that DuckDB does not support the type of "dynamic" SQL you have in mind.

The COLUMNS clause does allow lambdas, which give some degree of flexibility based on the column names, but in general(*), if you want to construct an SQL query dynamically, you have either to use the DuckDB CLI or rely on a programming language (such as Python).

To use the DuckDB CLI, you typically use .once (to write out the constructed query) and then .read (to execute it), as illustrated at https://duckdbsnippets.com/snippets/14/dynamic-sql-in-duckdb-cli


(*) The JSON extension provides some support for executing SQL SELECT statements constructed from JSON. See json_execute_serialized_sql in the manual.

Upvotes: 1

Related Questions