Niklas B
Niklas B

Reputation: 1969

Cloud Datastore avoid exploding indexes on very simple table

I'm trying to use Google Cloud Datastore to store METAR observations (airport weather observations) but I am experiencing what I think is exploding indexes. My index for station_id (which is a 4 character string) is 20 times larger than the actual data itself. The database will increase by roughly 250 000 entities per day, so index size will become an issue.

Table

 - observation_time (Date / Time) - indexed
 - raw_text (String) (which is ~200 characters) - unindexed
 - station_id (String) (which is always 4 characters) - indexed

Composite index:

  - station_id (ASC), observation_time (ASC)

Query

The only query I will ever run is:

query.add_filter('station_id', '=', station_icao)
query.add_filter('observation_time', '>=', before)
query.add_filter('observation_time', '<=', after)

where before and after are datetime values

Index sizes

name               type         count         size      index size
observation_time   Date/Time    1,096,184     26.14MB   313.62MB    
station_id         String       1,096,184     16.73MB   294.8MB 

Datastore reports:

Resource           Count        Size
Entities           1,096,184    244.62MB
Built-in-indexes   5,488,986    740.63MB
Composite indexes  1,096,184    137.99MB

Help

I guess my first question is: What am I missing? I assume I'm doing something un-optimized, but I can't figure out what. Query time is not an immediate issue here, as long as lookups stays below ~2s.

Can I simply remove the built-in indexes, will the composite continue to work?

I've read up on Google and StackOverflow but can't seem to wrap my head around this. The reason I simply don't try to remove all built-in indexes is that it takes quite some time to download/un-index/put all the data afterwards I need to way 48hours for the dashboard summary to update - ie it will take me days before I get a result.

Upvotes: 1

Views: 1303

Answers (1)

Dan McGrath
Dan McGrath

Reputation: 42018

As +Jeffrey Rennie pointed out, "Exploding Indexes" is a very specific term that does not apply here.

You can see how storage size is calculate from our documentation here, so you can apply it to your example to see where the size adds up.

TL;DR: You can save space by using slightly more concise (but still readable!) property names. For example, observation_time to observation, etc

Key things to keep in mind:

  • To have a composite index, you need to have the individual properties indexed, so don't remove the built-ins or it'll stop working
  • Built-ins are indexed twice - once for ascending and once for descending
  • Kind names and property names are strings used in the index for each entity, so the longer they are the bigger the indexes

Upvotes: 2

Related Questions