Reputation: 3189
I have a large 1.5Gb data file with multiple fields separated by tabs.
I need to do lookups in this file from a web interface/ajax queries like an API, possibly large number of ajax requests coming in each second. So it needs to be fast in response.
What is the fastest option for retrieving this data? Is there performance-tested info, benchmarking?
Considering the tab-separated CSV file is a flat file that will be loaded in the memory. But it cannot produce an index.
JSON has more text because, but an 'indexed' JSON can be created, grouping entries for a certain field.
Upvotes: 1
Views: 2180
Reputation: 198314
Neither. They are both horrible for your stated purpose. JSON cannot be partially loaded; TSV can be scanned without loading it in memory, but has sequential access. Use a proper database.
If, for some reason, you can't use a database, you can McGyver[1] it by using TSV or JSONL (not JSON) with an additional index file that specifies the byte position of the start of the record for each ID (or another searchable field).
1. to McGyver - a colloquial expression that refers to finding a creative and resourceful solution to a problem, often by using unconventional or readily available materials. The term is derived from the television show "MacGyver", in which the main character was known for his ingenuity in solving complex problems with simple materials. In the context of this question, "McGyver it" means to devise an alternative method for organizing and accessing data when a traditional database is not available.Upvotes: 2