Tom Updike
Tom Updike

Reputation: 73

Using REGEX_EXTRACT to pull one value but not the other from duplicates

I'm having issues pulling specific variables from large text logs.

A normal log looks like this:

 metadata {
    unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
    webhook_response_time: 155
    intent_name: "Dogs are the best"
    variable_one: "true"
    variable_two: "false"
    variable_three: "false"
  }

I just want to pull the intent_name variable, so I use the regular expression:

SELECT REGEXP_EXTRACT(textPayload, r"intent_name:(.+)") AS intent_name FROM table1

to pull out just the value "Dogs are the best". Now, in the logs, there are two different parts that include the phrase "intent_name", so this regular expression doesn't pull what I need it to. Here is an example of new logs below:

  metadata {
    intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
    webhook_response_time: 105
    intent_name: "Dogs are the best"
    execution_sequence {
      intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
      intent_name: "Cats are the best"
      variable_one: "true"
      variable_two: "false"
      variable_three: "false"
    }

How do I write an expression to just pull the first intent_name value "Dogs are the best" and not the one not inside the execution_sequence brackets?

Upvotes: 1

Views: 156

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269803

A JSON value would be much easier. But for the second log format, you could do:

select regexp_extract(textPayload, r"""intent_name: ("[^"]+")[\s\S]*execution_sequence""")
from (select '''metadata {
    unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
    webhook_response_time: 155
    intent_name: "Dogs are the best"
    variable_one: "true"
    variable_two: "false"
    variable_three: "false"
  }''' as textPayload union all
      SELECT '''metadata {
    intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
    webhook_response_time: 105
    intent_name: "Dogs are the best"
    execution_sequence {
      intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
      intent_name: "Cats are the best"
      variable_one: "true"
      variable_two: "false"
      variable_three: "false"
    }'''
     ) x

This does not work on the first format, but you can use a case expression if you need to support both.

Upvotes: 1

Related Questions