Logstash/Elasticsearch JDBC document_id vs document_type?

Question

So im trying to wrap my head around the document_type vs document_id when using the JDBC importer from logstash and exporting to elasticsearch.

I finally wrapped my head around indexes. But lets pretend im pulling from a table of sensor data (like temp/humidity/etc...) that has sensor id's...temps/ humidity (weather related data) with time recorded. (So it's a big table)

And I want to keep polling the database every X so often.

What would document_type vs document_id be in this instance, this is going to be stored (or whatever you want to call it) against 1 index.

The document_type vs document_id confuses me, especially in regards to JDBC importer.

If I set document_id to say my primary key, won't it get over-written each time? So i'll just have 1 document of data each time? (which seems pointless)

fylie · Accepted Answer

The jdbc plugin will create a JSON document with one field for each column. So to keep consistent with your example, if you had that data it would be imported as a document that looks like this:

{
    "sensor_id": 567,
    "temp": 90,
    "humidity": 6,
    "timestamp": "{time}",
    "@timestamp": "{time}" // auto-created field, the time Logstash received the document
}

You were right when you said that if you set document_id to your primary key, it would get overwritten. You can disregard document_id unless you want to update existing documents in Elasticsearch, which I don't imagine you would want to do with this type of data. Let Elasticsearch generate the document id for you.

Now let's talk about document_type. If you want to set the document type, you need to set the type field in Logstash to some value (which will propagate into Elasticsearch). So the type field in Elasticsearch is used to group similar documents. If all of the documents in your table that you're importing with the jdbc plugin are of the same type (they should be!), you can set type in the jdbc input like this...

input {
  jdbc {
    jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
    jdbc_user => "mysql"
    parameters => { "favorite_artist" => "Beethoven" }
    schedule => "* * * * *"
    statement => "SELECT * from songs where artist = :favorite_artist"
    ...
    type => "weather"
  }
}

Now, in Elasticsearch you can take advantage of the type field by setting a mapping for that type. For example you might want:

PUT my_index 
{
  "mappings": {
    "weather": { 
      "_all":       { "enabled": false  }, 
      "properties": { 
        "sensor_id":      { "type": "integer"  }, 
        "temp":           { "type": "integer"  }, 
        "humidity":       { "type": "integer" },
        "timestamp":      { "type": "date" }  
      }
    }
  }
}

Hope this helps! :)

Logstash/Elasticsearch JDBC document_id vs document_type?

Answers (1)

Related Questions