Reputation: 15
I am attempting to query a Pandas Dataframe with DuckDB that I materialize with read_sql_query.
Some of this data is stored in a JSON format and in the target column each value has a list of items - ["Value1", "Value2", "Valueetc"] that from the point of view of DuckDB is just a VARCHAR column.
My goal was to convert the string that looks like a list to an actual list in the DDB context -
STRING_SPLIT(REPLACE(REPLACE(REPLACE(ColumnName, '"',''),'[',''),']',''),',') AS ColumnName
I can definitely remove the quotes and brackets and return a list with this approach but I was hoping there's a less ugly/more idiomatic way to accomplish the goal.
Upvotes: 0
Views: 2514
Reputation: 2562
This seems slightly more elegant
select (regexp_split_to_array('["Value1", "Value2", "Valueetc"]','", "|\["|"\]'))[2:-1];
This returns a list of three string values
[Value1, Value2, Valueetc]
Regex should probably be made more robust wrt spaces and such
Upvotes: 1
Reputation: 86
Maybe this is easier! First, just load the JSON extension like in these docs: https://duckdb.org/docs/extensions/json Then try this: (untested!)
Select
from_json(
json(ColumnName),
'["VARCHAR"]'
) as my_list
Upvotes: 1