Tatha
Tatha

Reputation: 1293

Bulk upload log messages to local Elasticsearch

We have a few external applications in cloud (IBM Bluemix) which logs its application syslogs in the bluemix logmet service which internally uses the ELK stack.

Now on a periodic basis, we would like to download the logs from the cloud and upload it into a local Elastic/Kibana instance. This is because storing logs in cloud services incurs cost and additional cost if we want to search the same by Kibana. The local elastic instance can delete/flush old logs which we don't need.

The downloaded logs will look like this

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"Hello","type":"syslog","event_uuid":"474b78aa-6012-44f3-8692-09bd667c5822","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:15.598Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"3104b522-aba8-48e0-aef6-6291fc6f9250","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:15.598Z"}

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"EFG","type":"syslog","event_uuid":"d902dddb-afb7-4f55-b472-211f1d370837","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:28.636Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"dcd9f975-3be3-4451-a9db-6bed1d906ae8","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:28.636Z"}

I have created an index in our local elasticsearch as

curl -XPUT 'localhost:9200/commslog?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "logs" : {
            "properties" : {
                "instance_id_str" : { "type" : "text" },
                "source_id_str" : { "type" : "text" },
                "app_name_str" : { "type" : "text" },
                "message" : { "type" : "text" },
                "type" : { "type" : "text" },
                "event_uuid" : { "type" : "text" },
                "ALCH_TENANT_ID" : { "type" : "text" },
                "logmet_cluster" : { "type" : "text" },
                "org_name_str" : { "type" : "text" },
                "@timestamp" : { "type" : "date" },
                "message_type_str" : { "type" : "text" },
                "@version" : { "type" : "text" },
                "space_name_str" : { "type" : "text" },
                "application_id_str" : { "type" : "text" },
                "ALCH_ACCOUNT_ID_str" : { "type" : "text" },
                "org_id_str" : { "type" : "text" },
                "timestamp" : { "type" : "date" }
            }
        }
    }
}'

Now to bulk upload the file, used the command

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

The above command throws an error

Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]

The solution is to follow the rules for bulk upload as per

https://discuss.elastic.co/t/bulk-insert-file-having-many-json-entries-into-elasticsearch/46470/2

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

So i manually changed few of the log statements by adding action before every line

{ "index" : { "_index" : "commslog", "_type" : "logs" } } 

This works!!.

Another option was to call the curl command, providing the _idex and _type in the path

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

but without the action, this too throws the same error

The problem is we cannot do this for thousands of log records we get. Is there an option where once we download the log files from Bluemix and upload the files without adding the action.

NOTE We are not using logstash at the moment, but

Thanks

Upvotes: 0

Views: 1818

Answers (1)

baudsp
baudsp

Reputation: 4110

As @Alain Collins said, you should be able to use filebeat directly.

For logstash:

  • it should be possible to use logstash, but rather than using grok, you should use the json codec/filter, it would be much easier.
  • You can use the file input with logstash to process many files and wait for it to finish (to know when it's finished, use a file/stdout, possibly with the dot codec, and wait for it to stop writing).
  • Instead of just transforming the files with logstash, you should directly upload to elasticsearch (with the elasticsearch output).

As for your problem, I think it will be much easier to just use a small program to add the missing action line or use filebeat, unless you are experimented enough with logstash config to write and logstash config quicker than a program adding one line everywhere in the document.

Upvotes: 1

Related Questions