azazelspeaks
azazelspeaks

Reputation: 6045

Python: Local Database for storing and retrieving historical data

I'm trying to create some weather models and I want to store and retrieve data on my hard drive.

Data is in this format:

{'Date_Time':'2020-07-18 18:16:17','Temp':29.0, 'Humidity':45.3}
{'Date_Time':'2020-07-18 18:18:17','Temp':28.9, 'Humidity':45.4}
{'Date_Time':'2020-07-18 18:20:17 ','Temp':28.8, 'Humidity':48.3}

I have new data coming in every day, I have old data from ~5 years ago. I would like to periodically merge the data sets and create one large data set to manipulate.

Things I need:

1. Check if the date-time pair already exists, else add new data
2. Change old data values
3. Add new data values to the database
4. Must be on a local storage, I have plenty of space.

Things I would like but do not need:

1. Fastest Read access possible, not so concerned about storage time as that happens in the background mostly.
2. Something that makes searching for all data from today, last 7 days etc easy to retrieve

Things I have tried:

  1. Appending to a json file

    Works for now but is slow because I have to load the entire data set every time I want to append/modify

  2. Appending to a text file

    Easy to store, but hard to modify/check values

  3. SQLLite3

    I looked into this and it seemed workable, just wanted to know if there was something better before I just go ahead and do this.

Thank you for your help!

Upvotes: 0

Views: 7368

Answers (2)

Prashanth Mariswamy
Prashanth Mariswamy

Reputation: 352

The selection of JSON vs TXT vs SQL or NoSQL DB would be based on your current and future requirements.

  1. From your inputs, you have data for last 5 years and the data from the example is for every 2 seconds. Based on this, it seems like you will have a large dataset or will need to prune the dataset frequently. For large datasets, using a SQL or NoSQL DB would be ideal so that you do not load all data to memory for every read/write operation.
  2. Using the date-time as your primary key, you would be able to read-write pretty quickly using a database.
  3. Using SQLLite is a good start but if your data is going to grow, you should plan to move to an external SQL/NoSQL database.
  4. Seeing that your data is mostly time based, it would be good to evaluate Time Series database like InfluxDB or Graphite.

Upvotes: 2

Hugh
Hugh

Reputation: 380

Not sure whether it's "better" but json_database seems to do what you're looking for:

  • save and load from file
  • search recursively by key and key/value pairs
  • fuzzy search
  • supports arbitrary objects

Upvotes: 1

Related Questions