scottmont
scottmont

Reputation: 39

SnowPlow Analytics Schemas

Trying to create a schema with arrays for data retrieved from performance insights.

{
    "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
    "description": "Performance Insights",
    "self": {
        "vendor": "com.acme",
        "name": "performance_insights",
        "format": "jsonschema",
        "version": "1-0-3"
    },
    "type": "object",
    "properties": {
        "SeriesStartTime": {
            "type": "string",
            "format": "date-time",
            "description" : "timestamp"
        },
        "SeriesEndtime": {
            "type": "string",
            "format": "date-time",
            "description" : "timestamp"
        },
        "Identifier": {
            "description": "DataBase",
            "type": "string",
            "maxLength": 128
        },
        "MetricList": {
            "type": "array",
            "items":{
            "type": "object",
            "properties": {
                "Key": {
                    "type": "object",
                    "description": "Key Metric",
                    "properties": {
                        "Metric": {
                            "type": "string",
                            "description": "Load Avg"
                        },
                        "Dimensions": {
                            "properties": {
                                "tokenized_db": {
                                    "type": "string",
                                    "maxLength": 128
                                },
                                "tokenized_id": {
                                    "type": "string",
                                    "maxLength": 128
                                },
                                "tokenized_statement": {
                                    "type": "string"
                                }
                            }
                        }
                    }
                },                
                "DataPoints": {
                    "type": "array",
                    "items": {
                    "type": "object",
                    "properties": {
                        "Timestamp": {
                            "description" : "timestamp",
                            "type": "string",
                            "format": "date-time"
                        },
                        "Value": {
                            "description" : "Value",
                            "type": "number"
                        }
                    }
                }
                }
            }
        },
        "minItems": 1
    }
    
    },
    "additionalProperties": false
    
}

It lints ok then I send data to it:

{
      "schema": "iglu:com.acme/performance_insights/jsonschema/1-0-3",
      "data": {
        "SeriesStartTime": "2021-12-09T19:00:00-05:00",
        "SeriesEndtime": "2021-12-09T20:00:00-05:00",
        "Identifier": "db-5LHLHN5OGHFFFFMHRGDM",
        "MetricList": [
            {
                "Key": {
                    "Metric": "db.load.avg"
                },
                "DataPoints": [
                    {
                        "Timestamp": "2021-12-09T19:01:00-05:00",
                        "Value": 0.01818181818181818
                    },
                    {
                        "Timestamp": "2021-12-09T19:25:00-05:00",
                        "Value": 0.01818181818181818
                    }
                ]
            }]
    }
    }

I've pushed the schema to my repo I have push a couple others which work but not so complex as to receive array data.

Seems when I intentionally put the type wrong I see errors in my bad collector. When everything is correct as above I only see

schemaKey:"iglu:com.acme/performance_insights/jsonschema/1-0-3"
schemaCriterion:"iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-*"

As a failure.message

Any ideas ?

Upvotes: 0

Views: 891

Answers (1)

Dilyan
Dilyan

Reputation: 71

I'm making these assumptions:

  • the bad rows that you are citing in your example are of type tracker_protocol_violations
  • they are of subtype CriterionMismatch.

This type of failure happens when the supplied payload is a valid JSON and it is a self-describing one, but it's schema does not match the associated schema criterion. The failure.message shows you the schema of the failed event and the schema criterion that needed to be matched.

In your case the schema is com.acme/performance_insights/jsonschema/1-0-3 but the criterion to match is com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-*.

You are seeing this error because the self-describing event needs to be wrapped up in a com.snowplowanalytics.snowplow/payload_data schema. like this:

{
  "schema": "com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4",
  "data": [
    "...": "...",
    "ue_px": "<base64-encoded-string>",
    "...": "..."
  ]  
}

where <base64-encoded-string> is the base64-encoded JSON with the com.acme/performance_insights schema.

You can read more about this type of failure in the Snowplow documentation here.

Upvotes: 0

Related Questions