cwd
cwd

Reputation: 54756

In Solr, how can I get a list of one field ( document id ) for all documents?

I am working with a Solr instance that is populated from an oracle database. As records are added and deleted from the oracle database they are supposed to also be added and removed from Solr.

The schema.xml has this setup, which we use to store the ID that is also the primary key in oracle:

<uniqueKey>id</uniqueKey>
<field name="id" type="string" indexed="true" stored="true"/>

Furthermore the ids are not in sequential order. The solr admin interface has not been much help, I can only see the IDs along with the rest of each record, a few at a time, paginated.

There are about a million documents in this solr core.

I can easily get the IDs of the records from the oracle database, and so I would like to also get a list of the document id's from the solr index for comparison.

I haven't been able to find any information on how to do this but I may be searching

Upvotes: 3

Views: 4944

Answers (3)

arghtype
arghtype

Reputation: 4534

For Solr 7 syntax has changed a bit. This is what worked for me (in Java):

CloudSolrClient solrClient = ...;
solrClient.setDefaultCollection("collection1");
SolrQuery q = new SolrQuery("*:*");
q.set("fl", "id");
q.setRows(10000000);

Set<String> uniqueIds = solrClient.query(q).getResults()
  .stream().map(x -> (String) x.get("id"))
  .collect(Collectors.toSet());

Upvotes: 0

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

In latest Solr (4.10), you can export large number of records.

However, if you really just want one field, you can make a request with that one field and export as CSV. That minimizes the formatting overhead.

Upvotes: 1

Sylvain Leroux
Sylvain Leroux

Reputation: 51990

If you really need to get the id of all your documents, use the fl parameter. Something like that:

SolrQuery q = new SolrQuery("*:*&fl=id");
//                               ^^^^^
//                          return only the `id` field
q.setRows(10000000);
//        ^^^^^^^^
// insanely high number: retrieve _all_ rows
// see: http://wiki.apache.org/solr/CommonQueryParameters#rows-1
return server.query(q).getResults();

(untested):


For simple comparison between the content in Oracle and in Solr, you might just want to count documents:

SolrQuery q = new SolrQuery("*:*");
q.setRows(0);
//        ^
// don't retrieve _any_ row
return server.query(q).getResults().getNumFound();
//                                  ^^^^^^^^^^^^^
//                             just get the number of matching documents

(untested):

Upvotes: 7

Related Questions