Francis Malloch
Francis Malloch

Reputation: 1094

Storing large amounts of queryable JSON

I am trying to find a database solution that is capable of the following.

  1. Store flat, random, JSON structures separated by a table name(random_json_table_1, random_json_table_2 for example).
  2. Capable of handling a large number of insert operations(+10000/second).
  3. Able to query the random json structures(SELECT * FROM random_json_table_1 WHERE JSON_SELECT('data', '$.city.busses') NOT NULL AND JSON_SELECT('data', '$.city.busStops', 'length') > 5) for example.
  4. SELECT queries must run fast over gigabytes of data.

I had a look at Amazon Athena and it looks a bit promising but I am curious if there are any other solutions out there.

Upvotes: 0

Views: 258

Answers (2)

Yun Zhang
Yun Zhang

Reputation: 5503

You may consider BigQuery. Regarding 2), there is BigQuery streaming interface. And 4), you can play with BigQuery public data (e.g. the popular BitCoin transaction table) to see how fast BigQuery can be.

Below is sample query using BigQuery standardSQL, showing how to filter data which is stored in JSON string.

#standardSQL
SELECT JSON_EXTRACT(json_text, '$') AS student
FROM UNNEST([
  '{"age" : 1, "class" : {"students" : [{"name" : "Jane"}]}}',
  '{"age" : 2, "class" : {"students" : []}}',
  '{"age" : 10,"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
  ]) AS json_text
WHERE CAST(JSON_EXTRACT_SCALAR(json_text, '$.age') AS INT64) > 5;

enter image description here

Upvotes: 1

Kolban
Kolban

Reputation: 15266

It feels like Google's BigQuery managed database might be of value to you. Reading here we seem to find that there is a soft limit of 100,000 rows per second and the ability to insert 10,000 rows per single request. For performing queries, BigQuery advertises itself as being able to process petabyte sized tables within acceptable limits.

Here is a link to the main page for BigQuery:

https://cloud.google.com/bigquery/

Upvotes: 0

Related Questions