Zakaria Ait Yakoub
Zakaria Ait Yakoub

Reputation: 3

Elasticsearch Index Contains More Documents Than MySQL Query Results

I’m encountering an issue where the number of documents in my Elasticsearch index exceeds the number of rows returned by my MySQL query. Specifically:

This discrepancy persists even though the query results in MySQL are consistent. I suspect something is happening in my Logstash pipeline or Elasticsearch indexing process.



Logstash Configuration:

Here is my Logstash pipeline:

Input Configuration:

jdbc {
  jdbc_connection_string => "jdbc:mysql://ipadresss:3306/SUPER_TEST_TRACK"
  jdbc_user => "admin"
  jdbc_password => "something"
  jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
  jdbc_driver_library => "/usr/share/java/mysql-connector-java-9.1.0.jar"
  jdbc_paging_enabled => true
  jdbc_page_size => 9000
  statement => "SELECT
                  s.site AS siteName,
                  DATE_FORMAT(p.date_production, '%Y-%m-%d') AS productionDate,
                  SUM(p.initial_quantity * pcv.criterion_value) AS ptotalMeterage
                FROM
                  palet p
                JOIN palettes_criteria pc ON p.palet_id = pc.palette_id
                JOIN criterion_value pcv ON pc.criterion_value_id = pcv.criterion_value_id
                JOIN criterion c ON pcv.criterion_id = c.id_criterion
                JOIN site s ON p.id_site = s.id_site
                WHERE
                  c.criterion_name = 'unp'
                GROUP BY
                  s.site, productionDate
                ORDER BY
                  productionDate"
  schedule => "* * * * *"
}

Filter Configuration:

mutate {
  rename => {
    "siteName" => "site_name"
    "productionDate" => "production_date"
    "ptotalMeterage" => "p_total_meterage"
  }
  remove_field => ["@timestamp", "@version"]
}

Output Configuration:

elasticsearch {
  hosts => ["http://ipaddress:9200"]
  index => "produced-palette-meterage-per-site-and-date"
  document_id => "palet-%{sitename}-%{productiondate}-%{ptotalmeterage}"
}

Observations:

  1. Consistent MySQL Query Results:
| siteName  | productionDate | ptotalMeterage      |
| BERRECHID | 2023-12-17     | 14364.279999999972  |
  1. Elasticsearch Index:
{
  "_index": "produced-palette-meterage-per-site-and-date",
  "_type": "_doc",
  "_id": "palet-BERRECHID-2023-12-17-14364.279999999972",
  "_source": {
    "sitename": "BERRECHID",
    "productiondate": "2023-12-17",
    "ptotalmeterage": 14364.279999999972
  }
}

where is the issue and how to solve it?

Upvotes: 0

Views: 21

Answers (0)

Related Questions