Reputation: 2252
I add the Ingest Attachment Processor Plugin on to Elastic.
Than I create a very simple pdf file.
This file (the content) I try to inject into Elastic. (see commands below)
But the try to find a word out of the file fails. (see third answer near the lower end of the commands)
What is wrong or which step is missing?
Do I need to add some pipeline?
Is the PUT of the pdf correct and do I need to set the pdf content into the content field of the PUT command?
console commands...
1 console:
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars" : -1
}
}
]
}
1 answer:
{
"acknowledged" : true
}
2 console:
PUT my_index/_doc/001?pipeline=attachment
{
"filename": "C:\\ELK-Stack\\Test.pdf",
"data": "VGVzdA0KVGVzdCBEb2t1bWVudCB1bWdld2FuZGVsdCB2b24gd28NCkhpZXIgd2lyZCBnZXRlc3RldC4gRGFzIGlzdCBkZXIgVGVzdA==",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Test Test Dokument umgewandelt von word zu pdf. Hier wird getestet. Das ist der Test."
},
"title": "Quick"
}
2 answer:
{
"_index" : "my_index",
"_id" : "001",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
3 console:
GET /my_index/_search
{
"query": {
"match": {
"content": "Test"
}
}
}
3 answer:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
4 console:
GET /_search
{
"query": {
"match_all": {}
}
}
4 answer:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "001",
"_score" : 1.0,
"_source" : {
"filename" : """C:\ELK-Stack\Test.pdf""",
"data" : "VGVzdA0KVGVzdCBEb2t1bWVudCB1bWdld2FuZGVsdCB2b24gd28NCkhpZXIgd2lyZCBnZXRlc3RldC4gRGFzIGlzdCBkZXIgVGVzdA==",
"attachment" : {
"content_type" : "text/plain; charset=windows-1252",
"language" : "et",
"content" : """Test
Test Dokument umgewandelt von wo
Hier wird getestet. Das ist der Test""",
"content_length" : 77
},
"title" : "Quick"
}
}
]
}
}
Upvotes: 0
Views: 334
Reputation: 2252
Thanks to LeBigCat I find the solution.
I needed to add the full path to the field,
using: "attachment.content": "Test"
(instead of "content": "Test")
GET /my_index/_search
{
"query": {
"match": {
"attachment.content": "Test"
}
}
}
Upvotes: 0