ankitkhandelwal185
ankitkhandelwal185

Reputation: 1111

how to automate updation of data from mysql to logstash

Currently i am working on Elastic Stack with Mysql, everything working fine like data in mysql database is available on Elastic-search using Logstash but when new data entered in mysql db then i need to restart Logstash or it can be done using Schedule in config file of Logstash

input {

  jdbc { 
    jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
    # The user we wish to execute our statement as
    jdbc_user => "root"
    jdbc_password => "ankit"
    # The path to our downloaded jdbc driver
    jdbc_driver_library => "/home/ankit/Downloads/mysql-connector-java-5.1.38.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    #run logstash at an interval of on minute
    #schedule => "* * * * * *"
    # our query
    statement => "SELECT * FROM ghijkl"
    }
    jdbc { 
    jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
    # The user we wish to execute our statement as
    jdbc_user => "root"
    jdbc_password => "ankit"
    # The path to our downloaded jdbc driver
    jdbc_driver_library => "/home/ankit/Downloads/mysql-connector-java-5.1.38.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    #run logstash at an interval of on minute
    #schedule => "* * * * * *"
    # our query
    statement => "SELECT * FROM abcdef"
    }
  }

but this isn't a good approach, and i am thinking to use web-hooks but no resource is available to do so, i tried Logstash HTTP input plugin from documentation page but no help from this.

Please help.

Upvotes: 1

Views: 684

Answers (1)

Michael Dz
Michael Dz

Reputation: 3834

You can download only recent data let's say every 15 minutes using special query:

SELECT * FROM ghijkl" WHERE EVENT_TIME_OCCURRENCE_FIELD > :sql_last_value

In place of :sql_last_value a timestamp of the most recent record will be inserted . When the query is run for the first time tracking_column value is set to 01.01.1970.

Required configuration for Logstash:

schedule => "*/15 * * * *"
use_column_value => true
tracking_column => 'EVENT_TIME_OCCURRENCE_FIELD'

For every input you should also specify last_run_metadata_path parameter in order to avoid problems in the future, when you have many inputs and some are using the same table but different schemas then meta data might be overridden and produce unexpected results.

last_run_metadata_path => "PATH_TO_FILE_FOR_META_DATA"

Upvotes: 1

Related Questions