Reputation: 163
Below is how my rdns.json file looks like, it has around 1 billion records. I have tried several ways to import the file, but failed badly.
{"timestamp":"1573629372","name":"1.10.178.205","hostname":"node-a19.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573636816","name":"1.10.178.206","hostname":"node-a1a.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573647966","name":"1.10.178.207","hostname":"node-a1b.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573650758","name":"1.10.178.208","hostname":"node-a1c.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573660230","name":"1.10.178.209","hostname":"node-a1d.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573652982","name":"1.10.178.21","hostname":"node-9w5.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573614753","name":"1.10.178.210","hostname":"node-a1e.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573616716","name":"1.10.178.211","hostname":"node-a1f.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573626432","name":"1.10.178.212","hostname":"node-a1g.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573611374","name":"1.10.178.213","hostname":"node-a1h.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573655790","name":"1.10.178.214","hostname":"node-a1i.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573635098","name":"1.10.178.215","hostname":"node-a1j.pool-1-10.dynamic.totinternet.net","type":"ptr"}
{"timestamp":"1573628481","name":"1.10.178.216","hostname":"node-a1k.pool-1-10.dynamic.totinternet.net","type":"ptr"}
Could someone please guide me how i can import the file to elasticsearch.
Upvotes: 4
Views: 3079
Reputation: 163
The solution was to use elasticsearch_loader
It handled my file which was 128GB very nicely and imported it without the needing of doing any formatting to the file. The command i used was
elasticsearch_loader --index rdns --type dnsrecords json rdns.json --lines
Do note it takes a quiet amount of time to post the data though..
Upvotes: 3
Reputation: 7770
Nothing like using a native way to upload file to elasticsearch but have you considered using nodejs streams, newline delimited json and etl to do a bulk operation to elasticsearch while streaming. Basically something like
const es = require("elasticsearch");
const etl = require("etl");
const ndjson = require("ndjson");
const fs = require("fs");
const esClient = new es.Client({
"log": "trace"
});
fs.createReadStream(`${__dirname}/test.json`)
.pipe(ndjson.parse()) // parse the new line delimited json
.pipe(etl.collect(10)) // This could be anything depending on your single document size and elasticsearch cluster configuration
.pipe(etl.elastic.index(esClient, "someindex", "someType")) // bulk operation
.promise()
.then(res => console.log(res))
.catch(err => console.log(err));
Upvotes: 3
Reputation: 104
How large is your JSON file? I believe ElasticSearch has certain limitations on filesize. A common method of importing large datasets into ElasticSearch would be to split your JSON data into smaller sets, then upload them one at a time.
Some references:
https://discuss.elastic.co/t/loading-many-big-json-files-into-elasticsearch/128078/5
https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html
Upvotes: 0