Reputation: 4605
Suppose I want to select all columns of a certain type from a DuckDB table. For example, selecting all VARCHAR
type columns, after creating a table like:
CREATE TABLE dummy (x VARCHAR, y BIGINT, z VARCHAR);
INSERT INTO dummy
VALUES ('a', 0, 'a'),
('b', 1, 'b'),
('c', 2, 'c');
Inngeneral, I might have an arbitrary number of VARCHAR
type columns, so this query should be "dynamic". I get a list of the relevant columns using DESCRIBE
:
SELECT column_name
FROM (DESCRIBE dummy)
WHERE column_type = 'VARCHAR';
This statement gives me a list of the column names which have type VARCHAR
. But how do I use this? I tried using the COLUMNS
expression:
SELECT COLUMNS(
c->c IN (
SELECT column_name
FROM (DESCRIBE dummy)
WHERE column_type = 'VARCHAR'
)
)
FROM dummy
But this gives me the error: BinderException: Binder Error: Table function cannot contain subqueries
. I don't really understand the error. I get the same error when trying:
SELECT COLUMNS(
c->list_contains(
(
SELECT column_name
FROM (DESCRIBE dummy)
WHERE column_type = 'VARCHAR'
),
c
)
)
FROM dummy
How do I connect the dots between getting a list of columns by querying DESCRIBE tbl
, and then using that list to select from tbl
?
Upvotes: 4
Views: 1397
Reputation: 21580
SQL-level variables were added in DuckDB 1.1.0
e.g. SET VARIABLE
/ GETVARIABLE
duckdb.sql("""
SET VARIABLE VARCHAR_NAMES = (
SELECT LIST(column_name)
FROM (DESCRIBE dummy)
WHERE column_type = 'VARCHAR'
)
""")
duckdb.sql("""
FROM DUMMY SELECT COLUMNS(x -> x in GETVARIABLE('VARCHAR_NAMES'))
""")
┌─────────┬─────────┐
│ x │ z │
│ varchar │ varchar │
├─────────┼─────────┤
│ a │ a │
│ b │ b │
│ c │ c │
└─────────┴─────────┘
Upvotes: 2
Reputation: 1
Try to use the duckdb_columns() function since it provides metadata about the columns available in the DuckDB instance :
SELECT column_name
FROM duckdb_columns()
WHERE table_name = 'dummy' AND data_type = 'VARCHAR'
I am using Python to concatenate the column names into a single string, then used that string in the SELECT :
import duckdb
con = duckdb.connect('your_database.db')
column_names = con.execute("""
SELECT column_name
FROM duckdb_columns()
WHERE table_name = 'dummy' AND data_type = 'VARCHAR';
""").fetchall()
column_names = [col[0] for col in column_names]
columns_string = ', '.join(column_names)
query = f"SELECT {columns_string} FROM dummy;"
result = con.execute(query).fetchdf()
print(result)
Check @jqurious it may suits you well using SQL level variables.
Upvotes: 1
Reputation: 116957
As you noticed, the DuckDB binder error message says:
Table function cannot contain subqueries
Effectively, this means that DuckDB does not support the type of "dynamic" SQL you have in mind.
The COLUMNS clause does allow lambdas, which give some degree of flexibility based on the column names, but in general(*), if you want to construct an SQL query dynamically, you have either to use the DuckDB CLI or rely on a programming language (such as Python).
To use the DuckDB CLI, you typically use .once
(to write out the constructed query) and then .read
(to execute it), as illustrated at https://duckdbsnippets.com/snippets/14/dynamic-sql-in-duckdb-cli
(*) The JSON extension provides some support for executing SQL SELECT
statements constructed from JSON. See json_execute_serialized_sql
in the manual.
Upvotes: 1