Reputation: 15537
How can I combine two metrics into one, when they come from two separate resources?
I have a simple logs-based metrics, my-metric
, defined like this:
resource "google_logging_metric" "my_metric" {
name = "my-metric"
filter = <<-EOT
logName="projects/my-project/logs/my-app"
labels.name="my.event"
EOT
label_extractors = {
"event_type" = "EXTRACT(labels.event_type)"
}
metric_descriptor {
value_type = "INT64"
metric_kind = "DELTA"
labels {
key = "event_type"
value_type = "STRING"
}
}
}
I recently moved my app to Google Cloud Run (GCR) which has its own logs, so I updated the metric's filter like this:
(
logName="projects/my-project/logs/my_app"
OR
logName="projects/my-project/logs/run.googleapis.com%2Fstdout"
)
labels.name="my.event"
What I didn't expect is that the metric becomes attached to a different resource, so logically I have two metrics. In MQL:
gce_instance::logging.googleapis.com/user/my-metric
global::logging.googleapis.com/user/my-metric
I want to keep my existing alerting policies that are based on this metric, so I'm wondering if there's a way to either combine the metrics from the global and GCE instance resources into one metric (I would group by event_type
and add them up, for example).
I have tried to just get them merged into one graph in the metrics explorer.
I have almost exclusively used a single log and the global resource before, so my intuition was to simply do this:
fetch global::logging.googleapis.com/user/my-metric
This would only get me half of the values though. I realized I'd get the other half like this:
fetch gce_instance::logging.googleapis.com/user/my-metric
Ok, let's just combine them. I know enough MQL to be a danger to myself and others (or so I thought).
{
fetch global::logging.googleapis.com/user/my-metric
;
fetch gce_instance::logging.googleapis.com/user/my-metric
}
| outer_join 0
| add
That only shows the global
resource. It happens to be the first, so my intuition is to swap them around, sometimes that gives more information (I find the MQL reference very abstract, and I have mostly learned by copy-pasting examples and trial-and-error). Putting the gce_instance
first throws two errors instead:
Line 8: Input table 1 does not have time series identifier column 'resource.instance_id' that is present in table 0. Table 0 must be a subset of the time series identifier columns of table 1. Line 8: Input table 1 does not have time series identifier column 'resource.zone' that is present in table 0. Table 0 must be a subset of the time series identifier columns of table 1.
I don't really need instance_id
or zone
, so perhaps I can just remove them?
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| outer_join 0
| add
And now it's only the gce_instance
resource. For reference, here's what it looks like:
Only the gce_instance
resource:
join
I'm sure MQL is beautiful once you fully grasp it, but to me it's still a black box. Here's a few other attempts. I basically went through the MQL reference, trying every keyword I could find:
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| join
No data is available for the selected time frame
Don't know what that means. Next!
join
and group_by
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| group_by [metric.event_type], max(val())
| join
No data is available for the selected time frame
Useles... NEXT!
union_group_by
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| union_group_by [metric.event_type]
Chart definition invalid. INVALID_ARGUMENT: Request contains an invalid argument.
That's very helpful, thanks. NEXT!
outer_join
or_else
The outer_join
in my first attempt at least seemed to give two tables with values. Maybe I just need to combine them?
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| outer_join 0
| or_else
Very interesting. I now get a bunch of different time series, grouped by event_type
. They are all flatlining at 0 though. Changing to outer_join 123
? Yes, they are now all constantly 123
instead.
The outer_join
docs have this to say:
One or both of the left_default_value and right_default_value arguments must be given. Each corresponds to one input table (the first, "left", table or the second "right" table) and, when given for a table, that table will have rows created if it does not have some row that matches a row in the other table. Each argument specifies the value columns of the created row. If a default argument is given for a table, then the time series identifier columns in that table must be a subset of the time series of those of the other table and it can only have Delta time-series kind if the other table has Delta time-series kind.
I found this part vaguely interesting:
the time series identifier columns in that table must be a subset of the time series of those of the other table
Not sure what my time series identifier columns are. Perhaps they're just bad, but I'm not about to give up. What if they're not a subset? Perhaps I need to align, not aggregate? Did I mention that I don't know what I'm doing?
Aligning functions are use [not my typo] by the align table operation to produce an aligned table, one whose time series have points with timestamps at regular intervals.
I guess I need to invoke the align table operation with one of the aligning functions? Regular intervals sounds cool.
The aggregation docs has a section about aligners as well
{
fetch gce_instance::logging.googleapis.com/user/my-metric
| map drop [resource.zone, resource.instance_id]
;
fetch global::logging.googleapis.com/user/my-metric
}
| align interpolate(10m)
# | group_by [metric.event_type], sum(val())
| outer_join 0
| add
Interpolation doesn't give me the missing data. This one gives me the global
resource, but with a bit of interpolation where it doesn't have any data. This feels like a dead end as well.
I threw in a group_by
as well just in case, no change.
I'm starting to get slightly frustrated now, I have data in two tables, but no matter what I do I can only see the data in one of them. I've combined time series in various ways with MQL before and once it works I can usually explain why. It gets tricker when it doesn't work.
Perhaps we can get back to first principles somehow? I know group_by []
clears the labels, maybe that would simplify things?
{
fetch gce_instance::logging.googleapis.com/user/my-metric
;
fetch global::logging.googleapis.com/user/my-metric
}
| group_by []
Line 1: Expect query to have 1 result but had 2.
Ouch. Adding a | union
at the end?
Line 7: Input table 0 has legacy target schema 'cloud.CloudTask' which is different from input table 1's legacy target schema 'cloud.Global'. The inputs to the 'union' table operation are required to have the same column names, column types, and target schemas.
That's a new one! "Target schema" huh? Perhaps that's been the issue all along?
Let's consult the trusty reference! Schema... schema? No mentions about schemas.
The examples perhaps? No, but it says "before you begin". I've read it before, but perhaps I missed something?
Some familiarity with Cloud Monitoring concepts including metric types, monitored-resource types, and time series is helpful. For an introduction to these concepts, see Metrics, time series, and resources.
But no, the "Metrics, time series, and resources" page doesn't mention legacy target schemas either, or even schemas in general. Neither does the Components of the metric model or the Notes on terminology pages.
Am I at another dead end? A quick Google search seems to indicate that it is.
value[foo: val()]
etc.add
and or_else
etc.I have tried everything I can think of and read through most of the documentation a few times.
Writing this question, I found [an exciting answer](https://stackoverflow.com/a/67098846/98057] and tried with my metrics:
{
fetch gce_instance
| metric 'logging.googleapis.com/user/my-metric'
| group_by [], sum(val())
| align rate(1m)
| every 1m
;
fetch global
| metric 'logging.googleapis.com/user/my-metric'
| group_by [], sum(val())
| align rate(1m)
| every 1m
}
| join
| add
No data is available for the selected time frame
I have of course verified that at least one of the "subqueries" returns some data, in this case it's this one:
fetch gce_instance
| metric 'logging.googleapis.com/user/my-metric'
| group_by [], sum(val())
| align rate(1m)
| every 1m
How can I combine these two metrics from two separate resource types into one using MQL?
Upvotes: 5
Views: 3882
Reputation: 537
After hours of digging and experimenting, I found the following gave me what I wanted,
{
fetch gce_instance::logging.googleapis.com/user/ping
| group_by [metric.client]
;
fetch global::logging.googleapis.com/user/ping
| group_by [metric.client]
}
| union
Without the group_by [metric.client]
the union
failed with error something like The inputs to the 'union' table operation are required to have the same column names, column types, and target schemas
.
In my case, I went on to pipe to absent_for
to create an alert which would trigger if metric data was missing for any "client".
Upvotes: 1
Reputation: 15537
Here's the solution from GCP's support:
{
fetch gce_instance
| metric 'logging.googleapis.com/user/my-metric'
| group_by [], sum(val())
| align rate(1m)
| every 1m
;
fetch global
| metric 'logging.googleapis.com/user/my-metric'
| group_by [], sum(val())
| align rate(1m)
| every 1m
}
| outer_join 0,0
| add
I tried both outer_join(0,0)
(syntax error) and outer_join 0
, but outer_join 0,0
did what it's supposed to - adding a default value to both tables. Obvious, once you see it.
Upvotes: 2