How to add numeric IDs to elasticsearch documents when reading from CSV file using Logstash?

Question

After importing my elasticsearch documents from a CSV file using Logstash, my documents have their ID value set to long alphanumeric strings. How can I have each document ID set to a numeric value instead?

Here is basically what my logstash config looks like:

input {
    file {
        path => "/path/to/movies.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
    csv {
        columns => ["title","director","year","country"]
        separator => ","
    }
    mutate {
        convert => {
            "year" => "integer"
        }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "movie"
        document_type => "movie"
    }
    stdout {}
}

Val · Accepted Answer

The first and easiest option is to add a new column ID in your CSV and use that field as the document id.

Another option is to use a ruby filter that will add a dynamic ID to your events. The downside of this solution is that if your CSV changes and you re-run your pipeline each document might not get the same ID. Another downside is that you need to run your pipeline with only one worker (i.e. with -w 1) because the id_seq variable cannot be shared between worker pipelines.

filter {
    csv {
        columns => ["title","director","year","country"]
        separator => ","
    }
    mutate {
        convert => {
            "year" => "integer"
        }
    }
     # create ID
    ruby {
        "init" => "id_seq = 0"
        "code" => "
            event.set('id', id_seq)
            id_seq += 1
        "
    }
}
output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "movie"
        document_type => "movie"
        document_id => "%{id}"
    }
    stdout {}
}

How to add numeric IDs to elasticsearch documents when reading from CSV file using Logstash?

Answers (1)

Related Questions