iqzer0
iqzer0

Reputation: 163

Importing of a large json file to elasticsearch

Below is how my rdns.json file looks like, it has around 1 billion records. I have tried several ways to import the file, but failed badly.

{"timestamp":"1573629372","name":"1.10.178.205","hostname":"node-a19.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573636816","name":"1.10.178.206","hostname":"node-a1a.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573647966","name":"1.10.178.207","hostname":"node-a1b.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573650758","name":"1.10.178.208","hostname":"node-a1c.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573660230","name":"1.10.178.209","hostname":"node-a1d.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573652982","name":"1.10.178.21","hostname":"node-9w5.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573614753","name":"1.10.178.210","hostname":"node-a1e.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573616716","name":"1.10.178.211","hostname":"node-a1f.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573626432","name":"1.10.178.212","hostname":"node-a1g.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573611374","name":"1.10.178.213","hostname":"node-a1h.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573655790","name":"1.10.178.214","hostname":"node-a1i.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573635098","name":"1.10.178.215","hostname":"node-a1j.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573628481","name":"1.10.178.216","hostname":"node-a1k.pool-1-10.dynamic.totinternet.net","type":"ptr"}

Could someone please guide me how i can import the file to elasticsearch.

Upvotes: 4

Views: 3079

Answers (3)

iqzer0
iqzer0

Reputation: 163

The solution was to use elasticsearch_loader

It handled my file which was 128GB very nicely and imported it without the needing of doing any formatting to the file. The command i used was

elasticsearch_loader --index rdns --type dnsrecords json rdns.json --lines

Do note it takes a quiet amount of time to post the data though..

Upvotes: 3

Ashish Modi
Ashish Modi

Reputation: 7770

Nothing like using a native way to upload file to elasticsearch but have you considered using nodejs streams, newline delimited json and etl to do a bulk operation to elasticsearch while streaming. Basically something like

const es = require("elasticsearch");
const etl = require("etl");
const ndjson = require("ndjson");
const fs = require("fs");

const esClient = new es.Client({
  "log": "trace"
});

fs.createReadStream(`${__dirname}/test.json`)
  .pipe(ndjson.parse()) // parse the new line delimited json
  .pipe(etl.collect(10)) // This could be anything depending on your single document size and elasticsearch cluster configuration
  .pipe(etl.elastic.index(esClient, "someindex", "someType")) // bulk operation
  .promise()
  .then(res => console.log(res))
  .catch(err => console.log(err));

Upvotes: 3

hdump
hdump

Reputation: 104

How large is your JSON file? I believe ElasticSearch has certain limitations on filesize. A common method of importing large datasets into ElasticSearch would be to split your JSON data into smaller sets, then upload them one at a time.

Some references:

https://discuss.elastic.co/t/loading-many-big-json-files-into-elasticsearch/128078/5

https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html

Upvotes: 0

Related Questions