R2D2
R2D2

Reputation: 10697

parse ingest http logs api_name to elastic

I have http log with field url:"/api/api_name/api_id"

example1 url: /api/apiX/0121313123

example2 url: /api/apiY/012132/optionX/1000

What is the best practice to extract from the url and ingest in elasticsearch only the "/api/api_name" and remove the id so it is suitable to visualize later in kibana distribution per api_name?

Upvotes: 0

Views: 39

Answers (1)

Evaldas Buinauskas
Evaldas Buinauskas

Reputation: 14077

Not sure if this is the best practice, but what works for us is that we index URL as a separate field only for the API:

DELETE urls

PUT /urls
{
  "settings": {
    "analysis": {
      "char_filter": {
        "api_extractor_char_filter": {
          "type": "pattern_replace",
          "pattern": "/?api/([^/]+)/?.*",
          "replacement": "api/$1"
        }
      },
      "normalizer": {
        "api_extractor": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "char_filter": [
            "api_extractor_char_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "url": {
        "type": "text",
        "fields": {
          "api": {
            "type": "keyword",
            "normalizer": "api_extractor"
          }
        }
      }
    }
  }
}

POST /urls/_doc
{"url":"/api/apiX/0121313123"}
POST /urls/_doc
{"url":"/api/apiY/012132/optionX/1000"}

GET /urls/_search
{
  "query": {
    "term": {
      "url.api": {
        "value": "/api/apiY"
      }
    }
  }
}

This way we keep the original URL and with the .api field index only what you're asking for. This field can be used for exact searches and aggregations. It will work just fine for your use case.

Other possible ways:

  • Use ingestion pipelines to change document source and extract URL
  • Ingest both URL and API names separately from your application/logs collector

Upvotes: 1

Related Questions