Constantine Kokkinos
Constantine Kokkinos

Reputation: 15

Is there any easier way to convert a string that looks like a list in DuckDB than regexp_split_to_array or string_split?

I am attempting to query a Pandas Dataframe with DuckDB that I materialize with read_sql_query.

Some of this data is stored in a JSON format and in the target column each value has a list of items - ["Value1", "Value2", "Valueetc"] that from the point of view of DuckDB is just a VARCHAR column.

My goal was to convert the string that looks like a list to an actual list in the DDB context -

STRING_SPLIT(REPLACE(REPLACE(REPLACE(ColumnName, '"',''),'[',''),']',''),',') AS ColumnName

I can definitely remove the quotes and brackets and return a list with this approach but I was hoping there's a less ugly/more idiomatic way to accomplish the goal.

Upvotes: 0

Views: 2514

Answers (2)

Hannes Mühleisen
Hannes Mühleisen

Reputation: 2562

This seems slightly more elegant

select (regexp_split_to_array('["Value1", "Value2", "Valueetc"]','", "|\["|"\]'))[2:-1];

This returns a list of three string values

[Value1, Value2, Valueetc] 

Regex should probably be made more robust wrt spaces and such

Upvotes: 1

AlexMonahan
AlexMonahan

Reputation: 86

Maybe this is easier! First, just load the JSON extension like in these docs: https://duckdb.org/docs/extensions/json Then try this: (untested!)

Select
  from_json(
    json(ColumnName),
    '["VARCHAR"]'
  ) as my_list

Upvotes: 1

Related Questions