Reputation: 4032
We have a table with a column 'streamId' that itself is composed of several parts. I'm trying to explode the parts into separate columns in a query on the fly but not getting very far.
This is an extract of my values:
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_averageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_averageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_event"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_latency95"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_latency99"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_maxMovingAverageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_movingAverageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_movingAverageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-0_sdLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_averageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_averageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_event"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_latency95"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_latency99"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_maxMovingAverageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_movingAverageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_movingAverageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-1_sdLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_averageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_averageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_event"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_latency95"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_latency99"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_maxMovingAverageLatency"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_movingAverageFlow"
"latencyPerNodeByMachineAggregator:SOURCE_X/PROCESS_Y->DESTINATION_Z/strategy_A->fcgw-2_sdLatency"
The components are <sourceId>:<nodePath>_<columnId>
That _
is the LAST _
character in the string - nodePath can have _
inside. Also, it is valid to have only <sourceId>:_<columnId>
without a nodePath when I have a root node.
It seems QuestDB does not support \1
or similar for substitutions and I need to regexp_replace each part individually, but so far no luck. This is what I have, which is close but it is not capturing the columnId
.
select streamId,regexp_replace(streamId, ':(.+)', '') as sourceId,
regexp_replace(streamId, '^(.+:)$', '') as nodepathParts,
regexp_replace(streamId, '^(.*)_.*?$', '') as columnId from table;
Upvotes: 0
Views: 8
Reputation: 4032
For the columnId in QuestDB we can use regexp_replace(streamId, '^.*_', '')
, which will replace anything before the very last _
by empty, so the result will be the columnId
.
Similarly, we can use this other expression '_([^_]+$)'
to replace anything after the last underscore until the end of the string (the columnId) with empty, leaving only the whole string except the last part, and then we can regexp_replace with this '.+:'
to remove the beginning of the string until the colon, leaving as result the NodePath
(if any).
And we can use this regexp ':(.+)'
, which could also have been a split_part()
, to keep only the first segment, the sourceId.
Altogether, it would be this:
with s as (
SELECT 'PROCESS_X-agentPerMachineAggregator:STAGE_A/strategy_A/27320_averageFlow' AS streamId
UNION ALL SELECT 'PROCESS_X-agentPerMachineAggregator:STAGE_A_movingAverageFlow'
UNION ALL SELECT 'PROCESS_X-agentPerMachineAggregator:_averageFlow'
UNION ALL SELECT 'PROCESS_Y-agentPerMachineAggregator:STAGE_B/strategy_A/27320_averageFlow'
UNION ALL SELECT 'PROCESS_Y-latencyPerNodeByMachineAggregator:STAGE_B->STAGE_A/strategy_A->strategy_A_averageFlow'
)
select streamId,
regexp_replace(streamId, ':.*', '') as sourceId,
regexp_replace(regexp_replace(streamId, '_([^_]+$)', ''),'.+:','') as nodePath,
regexp_replace(streamId, '^.*_', '') as columnId
from s;
Upvotes: 0