Reputation: 9457
This is an issue I hit a while ago when I was actively changing indexes. I had not needed to change my indexes for a long time but yesterday I had to make some changes. Once again, I spent several hours in the following loop:
1. Upload new index definitions
2. Indexing of some new complicated indexes fail after some time
3. Vacuum the failed indexes
4. when vacuum succeeds GoTo 1 if it fails Goto 3
I finally managed to get all of them working after several attempts. As step 4 suggests, even “removing an index" failed at times and had to be retried. I have some complicated indexes but not too much data in the big scheme of things.
I added 50 new indexes (the plan has been to remove 50 others later but since this is a 7x24 SaaS site the new and old indexes have to coexist for a while). The indexing of 42 indexes succeeded the first time. The other 8 caused the trouble I describe above. One characteristic of those indexes is that they index 2-3 order of magnitude more entities than the others. Here is an example of an index that failed many times before finally succeeding:
- kind: Audit
ancestor: yes # shallow hierarchy probably 3 levels max
properties:
- name: events # List, 20 - 60 elements in a typical instance
- name: effective_date # datetime
direction: desc
- name: prop_date # datetime
direction: desc
- name: __key__
direction: desc
What has been your experience with re-indexing existing data? Is there any tip/workaround to avoid this?
Relevant issues I suggest you star:
6133: Improved index management
Upvotes: 0
Views: 85
Reputation: 100
The symptoms you describe indicate you have an exploding index.
The issue is not the number of entities in the kind, it is the number of index entries required. There is a limit of 20,000 index entries per entity. There is a related limit of 2MB of encoded index entries. The most common cause of exceeding this limit is two index two repeated properties, since an index entry is required for each pair (cross product) of the properties.
Upvotes: 0
Reputation: 41099
Don't use a lot of indexes, especially complex.
Every indexed property and every composite index increase data size and writing costs significantly. Instead, you may load more entities and then filter out unnecessary entities in code.
To illustrate, instead of indexing an address by country, state and city, you can index it by city only. Sure, there may be cities with the same name within a country or globally, but it's much cheaper to retrieve all records by city
only, and then filter out one or two from the wrong country or state.
The same logic applies to any query where the expected number of results is relatively small.
NB: In 7 years of using the Datastore I have never seen an index build/removal fail.
Upvotes: 1