Reputation: 2222
When data from a device goes into the elastic there are duplicates. I like to avoid this duplicates. I'm using a object of IElasticClient, .NET and NEST to put data.
I searched for a method like ElasticClient.SetDocumentId()
, but cant find.
_doc doc = (_doc)obj;
HashObject hashObject = new HashObject { DataRecordId = doc.DataRecordId, TimeStamp = doc.Timestamp };
// hashId should be the document ID.
int hashId = hashObject.GetHashCode();
ElasticClient.IndexDocumentAsync(doc);
I would like to update the data set inside the Elastic instead of adding one more same object right now.
Upvotes: 1
Views: 1048
Reputation: 2222
Thank you very much Russ for this detailed and easy to understand description! :-)
The HashObject should be just a helper to get a unique ID from my real _doc object. Now I add a Id property to my _doc class and the rest I will show with my code below. I get now duplicates any more into the Elastic.
public void Create(object obj)
{
_doc doc = (_doc)obj;
string idAsString = doc.DataRecordId.ToString() + doc.Timestamp.ToString();
int hashId = idAsString.GetHashCode();
doc.Id = hashId;
ElasticClient.IndexDocumentAsync(doc);
}
Upvotes: 0
Reputation: 125528
Assuming the following set up
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex("example")
.DefaultTypeName("_doc");
var client = new ElasticClient(settings);
public class HashObject
{
public int DataRecordId { get; set; }
public DateTime TimeStamp { get; set; }
}
If you want to set the Id for a document explicitly on the request, you can do so with
var indexResponse = client.Index(new HashObject(), i => i.Id("your_id"));
var indexRequest = new IndexRequest<HashObject>(new HashObject(), id: "your_id");
var indexResponse = client.Index(indexRequest);
both result in a request
PUT http://localhost:9200/example/_doc/your_id
{
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
As Rob pointed out in the question comments, NEST has a convention whereby it can infer the Id from the document itself, by looking for a property on the CLR POCO named Id
. If it finds one, it will use that as the Id for the document. This does mean that an Id value ends up being stored in _source
(and indexed, but you can disable this in the mappings), but it is useful because the Id value is automatically associated with the document and used when needed.
If HashObject
is updated to have an Id value, now we can just do
var indexResponse = client.IndexDocument(new HashObject { Id = 1 });
var indexRequest = new IndexRequest<HashObject>(new HashObject { Id = 1});
var indexResponse = client.Index(indexRequest);
which will send the request
PUT http://localhost:9200/example/_doc/1
{
"id": 1,
"dataRecordId": 0,
"timeStamp": "0001-01-01T00:00:00"
}
If your documents do not have an id
field in the _source
, you'll need to handle the _id
values from the hits metadata from each hit yourself. For example
var searchResponse = client.Search<HashObject>(s => s
.MatchAll()
);
foreach (var hit in searchResponse.Hits)
{
var id = hit.Id;
var document = hit.Source;
// do something with them
}
Upvotes: 1