Reputation: 505
I can't figure out how I should send scroll_id to ElasticSearch using Curl.
This is what I have tried so far, but it doesn't seem to work.
$url = "http://distribution.virk.dk/cvr-permanent/virksomhed/_search?scroll=2m&_scroll_id=".$_POST["scroll_id"];
$data = array(
"_scroll_id" => $_POST["scroll_id"],
"scroll_id" => $_POST["scroll_id"],
"size" => 10,
"_source" => array(
"Vrvirksomhed.cvrNummer",
"Vrvirksomhed.elektroniskPost",
"Vrvirksomhed.livsforloeb",
"Vrvirksomhed.hjemmeside",
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn",
"Vrvirksomhed.hovedbranche",
"Vrvirksomhed.penheder",
"Vrvirksomhed.telefonnummer",
"Vrvirksomhed.virksomhedMetadata.nyesteBeliggenhedsadresse"
),
"query" => array (
"bool" => array (
"must_not" => array (
"exists" => array (
"field" => "Vrvirksomhed.livsforloeb.periode.gyldigTil"
)
)
)
)
);
The ElasticSearch returns the same 10 posts every time, so I think it doesn't get the scroll_id right.
Updated code after trying Val's suggestion. Using the setHosts I get a timeout after a long time. Leaving out the setHosts, I get the error saying that No alive nodes found in your cluster.
use Elasticsearch\ClientBuilder;
require 'vendor/autoload.php';
$username = "MY_USERNAME";
$password = "MY_PASSWORD";
$hosts = [
'host' => 'distribution.virk.dk',
'scheme' => 'http',
'path' => '/cvr-permanent',
'port' => '80',
'user' => $username,
'pass' => $password
];
$client = ClientBuilder::create()->setHosts($hosts)->build();
$params = [
'scroll' => '30s',
'size' => 50,
'type' => '/cvr-permanent/virksomhed',
'index' => 'virksomhed',
'body' => [
'query' => [
'match_all' => new \stdClass()
]
]
];
// Execute the search
// The response will contain the first batch of documents
// and a scroll_id
$response = $client->search($params);
// Now we loop until the scroll "cursors" are exhausted
while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {
// **
// Do your work here, on the $response['hits']['hits'] array
// **
// When done, get the new scroll_id
// You must always refresh your _scroll_id! It can change sometimes
$scroll_id = $response['_scroll_id'];
// Execute a Scroll request and repeat
$response = $client->scroll([
'body' => [
'scroll_id' => $scroll_id, //...using our previously obtained _scroll_id
'scroll' => '30s' // and the same timeout window
]
]);
}
Upvotes: 0
Views: 1259
Reputation: 217314
There are two steps for using the scroll API.
In the first step you need to send the query and the duration of the scroll context.
In the second step, you don't need to send the query again, but only the scroll id you got from the previous scroll search.
You can find a full-blown example here
Upvotes: 1