Reputation: 73
I'm having issues pulling specific variables from large text logs.
A normal log looks like this:
metadata {
unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
webhook_response_time: 155
intent_name: "Dogs are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}
I just want to pull the intent_name variable, so I use the regular expression:
SELECT REGEXP_EXTRACT(textPayload, r"intent_name:(.+)") AS intent_name FROM table1
to pull out just the value "Dogs are the best". Now, in the logs, there are two different parts that include the phrase "intent_name", so this regular expression doesn't pull what I need it to. Here is an example of new logs below:
metadata {
intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
webhook_response_time: 105
intent_name: "Dogs are the best"
execution_sequence {
intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
intent_name: "Cats are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}
How do I write an expression to just pull the first intent_name value "Dogs are the best" and not the one not inside the execution_sequence brackets?
Upvotes: 1
Views: 156
Reputation: 1269803
A JSON value would be much easier. But for the second log format, you could do:
select regexp_extract(textPayload, r"""intent_name: ("[^"]+")[\s\S]*execution_sequence""")
from (select '''metadata {
unique_id: "88dvsq113-0dcf-410f-84fb-d342076def6f"
webhook_response_time: 155
intent_name: "Dogs are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}''' as textPayload union all
SELECT '''metadata {
intent_id: "a664f00f-8105-4e09-bc34-2836dbe89ee1"
webhook_response_time: 105
intent_name: "Dogs are the best"
execution_sequence {
intent_id: "e231c181-31d9-4bfa-b2d8-7a52314bc628"
intent_name: "Cats are the best"
variable_one: "true"
variable_two: "false"
variable_three: "false"
}'''
) x
This does not work on the first format, but you can use a case
expression if you need to support both.
Upvotes: 1