Reputation: 14689
I have a CSV file with the following structure
col1, col2, col3
1|E|D
2|A|F
3|E|F
...
I am trying to index it on ElasticSearch using logstash, so I created the following logstash configuration file:
input {
file {
path => "/path/to/data"
start_position => "beginning"
}
}
filter {
csv {
separator => "|"
columns => ["col1","col2","col3"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "myindex"
document_type => "mydoctype"
}
stdout {}
}
But logstash halts with no messagse except the following:
$ /opt/logstash/bin/logstash -f logstash.conf
Settings: Default pipeline workers: 8
Pipeline main started
Increasing the verbosity gives the following message (which doesn't include any particular error)
$ /opt/logstash/bin/logstash -v -f logstash.conf
starting agent {:level=>:info}
starting pipeline {:id=>"main", :level=>:info}
Settings: Default pipeline workers: 8
Registering file input {:path=>["/path/to/data"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/username/.sincedb_55b24c6ff18079626c5977ba5741584a", :path=>["/path/to/data"], :level=>:info}
Using mapping template from {:path=>nil, :level=>:info}
Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"float"}, "longitude"=>{"type"=>"float"}}}}}}}, :level=>:info}
New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}
Starting pipeline {:id=>"main", :pipeline_workers=>8, :batch_size=>125, :batch_delay=>5, :max_inflight=>1000, :level=>:info}
Pipeline main started
Any advice on what to do to index the csv file?
Upvotes: 0
Views: 208
Reputation: 603
Since logstash tries to be smart about not replaying old file lines, you can try using a tcp input and netcat'ing the file to the open port.
The input section would look like:
input {
tcp {
port => 12345
}
}
Then once logstash is running and listening on the port, you can send your data in with:
cat /path/to/data | nc localhost 12345
Upvotes: 1
Reputation: 16362
If, during your testing, you've processed the file before, logstash keeps a record of that (the inode and byte offset) in the sincedb file that is referenced by your output. You can remove the file (if not needed), or set the sincedb_path in your file{} input.
Upvotes: 1