odevney
odevney

Reputation: 83

Extract the highest key:value pair from a string in Standard SQL

I have the following data type below, it is a type of key value pair such as 116=0.2875. Big Query has stored this as a string. What I am required to do is to extract the key i.e 116 from each row.

To make things more complicated if a row has more than one key value pair the iteration to be extracted is the one with the highest number on the right e.g {1=0.1,2=0.8} so the extracted number would be 2.

I am struggling to use SQL to perform this, Particularly as some rows have one value and some have multiple:

enter image description here

This is as close as I have managed to get where I can create a bit of code to extract the highest right hand value (which I don't need) but I just cant seem to create something to either get the whole key/value pair which would be fine and work for me or just the key which would be great.

  column
  ,(SELECT MAX(CAST(Values AS NUMERIC)) FROM UNNEST(JSON_EXTRACT_ARRAY(REPLACE(REPLACE(REPLACE(column,"{","["),"}","]"),"=",","))) AS Values WHERE Values LIKE "%.%") AS Highest
  from `table`

Here is some sample data:

1   {99=0.25}
2   {99=0.25}
3   {99=0.25}
4   {116=0.2875, 119=0.6, 87=0.5142857142857143}
5   {105=0.308724832214765}
6   {105=0.308724832214765}
7   {139=0.5712754555198284}
8   {127=0.5767967894928858}    
9   {134=0.2530120481927711, 129=0.29696599825632086, 73=0.2662459427947186}    
10  {80=0.21242613001118038}

Any help on this conundrum would be greatly appreciated!

Upvotes: 1

Views: 525

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Consider below approach

select column, 
  ( select cast(split(kv, '=')[offset(0)] as int64)
    from unnest(regexp_extract_all(column, r'(\d+=\d+.\d+)')) kv
    order by cast(split(kv, '=')[offset(1)] as float64) desc 
    limit 1
  ) key
from your_table               

if applied to sample data in your question - output is

enter image description here

Upvotes: 3

Related Questions