Mohamed Ali JAMAOUI
Mohamed Ali JAMAOUI

Reputation: 14689

Logstash not responding when trying to index a CSV file

I have a CSV file with the following structure

col1, col2, col3 
1|E|D
2|A|F
3|E|F
... 

I am trying to index it on ElasticSearch using logstash, so I created the following logstash configuration file:

input {
  file {
    path => "/path/to/data"
    start_position => "beginning"
  }
}
filter {
  csv {
      separator => "|"
     columns => ["col1","col2","col3"]
}
}
output {
   elasticsearch {
     hosts => ["localhost:9200"]
     index => "myindex"
     document_type => "mydoctype"
  }
stdout {}
}

But logstash halts with no messagse except the following:

$ /opt/logstash/bin/logstash -f logstash.conf
Settings: Default pipeline workers: 8
Pipeline main started

Increasing the verbosity gives the following message (which doesn't include any particular error)

$ /opt/logstash/bin/logstash -v -f logstash.conf
starting agent {:level=>:info}
starting pipeline {:id=>"main", :level=>:info}
Settings: Default pipeline workers: 8
Registering file input {:path=>["/path/to/data"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/username/.sincedb_55b24c6ff18079626c5977ba5741584a", :path=>["/path/to/data"], :level=>:info}
Using mapping template from {:path=>nil, :level=>:info}
Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"float"}, "longitude"=>{"type"=>"float"}}}}}}}, :level=>:info}
New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}
Starting pipeline {:id=>"main", :pipeline_workers=>8, :batch_size=>125, :batch_delay=>5, :max_inflight=>1000, :level=>:info}
Pipeline main started

Any advice on what to do to index the csv file?

Upvotes: 0

Views: 208

Answers (2)

Console Catzirl
Console Catzirl

Reputation: 603

Since logstash tries to be smart about not replaying old file lines, you can try using a tcp input and netcat'ing the file to the open port.

The input section would look like:

input {
  tcp {
    port => 12345
  }
}

Then once logstash is running and listening on the port, you can send your data in with:

cat /path/to/data | nc localhost 12345

Upvotes: 1

Alain Collins
Alain Collins

Reputation: 16362

If, during your testing, you've processed the file before, logstash keeps a record of that (the inode and byte offset) in the sincedb file that is referenced by your output. You can remove the file (if not needed), or set the sincedb_path in your file{} input.

Upvotes: 1

Related Questions