Reputation: 1969
I'm trying to use Google Cloud Datastore to store METAR observations (airport weather observations) but I am experiencing what I think is exploding indexes. My index for station_id
(which is a 4 character string) is 20 times larger than the actual data itself. The database will increase by roughly 250 000 entities per day, so index size will become an issue.
Table
- observation_time (Date / Time) - indexed
- raw_text (String) (which is ~200 characters) - unindexed
- station_id (String) (which is always 4 characters) - indexed
Composite index:
- station_id (ASC), observation_time (ASC)
Query
The only query I will ever run is:
query.add_filter('station_id', '=', station_icao)
query.add_filter('observation_time', '>=', before)
query.add_filter('observation_time', '<=', after)
where before
and after
are datetime values
Index sizes
name type count size index size
observation_time Date/Time 1,096,184 26.14MB 313.62MB
station_id String 1,096,184 16.73MB 294.8MB
Datastore reports:
Resource Count Size
Entities 1,096,184 244.62MB
Built-in-indexes 5,488,986 740.63MB
Composite indexes 1,096,184 137.99MB
Help
I guess my first question is: What am I missing? I assume I'm doing something un-optimized, but I can't figure out what. Query time is not an immediate issue here, as long as lookups stays below ~2s.
Can I simply remove the built-in indexes, will the composite continue to work?
I've read up on Google and StackOverflow but can't seem to wrap my head around this. The reason I simply don't try to remove all built-in indexes is that it takes quite some time to download/un-index/put all the data afterwards I need to way 48hours for the dashboard summary to update - ie it will take me days before I get a result.
Upvotes: 1
Views: 1303
Reputation: 42018
As +Jeffrey Rennie pointed out, "Exploding Indexes" is a very specific term that does not apply here.
You can see how storage size is calculate from our documentation here, so you can apply it to your example to see where the size adds up.
TL;DR: You can save space by using slightly more concise (but still readable!) property names. For example, observation_time
to observation
, etc
Key things to keep in mind:
Upvotes: 2