Reputation: 10563
Say I have the following data:
import duckdb
rel = duckdb.sql("""
FROM VALUES
([{'a': 'foo', 'b': 'bta'}]),
([]),
([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
df(my_col)
SELECT *
""")
which looks like this:
┌──────────────────────────────────────────────┐
│ my_col │
│ struct(a varchar, b varchar)[] │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}] │
│ [] │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘
I would like to keep all rows where for any of the items in one of the elements of 'my_col'
, field 'a'
contains the substring 'bt'
So, expected output:
┌──────────────────────────────────────────────┐
│ my_col │
│ struct(a varchar, b varchar)[] │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}] │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘
How can I write a SQL query to do that?
Upvotes: 2
Views: 21
Reputation: 21580
Maybe list_sum()
the bools or list_bool_or()
?
duckdb.sql("""
FROM VALUES
([{'a': 'foo', 'b': 'bta'}]),
([]),
([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
df(my_col)
SELECT *
WHERE list_bool_or(['bt' in s.b for s in my_col])
""")
┌──────────────────────────────────────────────┐
│ my_col │
│ struct(a varchar, b varchar)[] │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}] │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘
The list comprehension is the same as list_apply(my_col, s -> 'bt' in s.b)
Upvotes: 1