Reputation: 11
#Background
I am currently playing with some web scraping project as I am learning python. I have a project which scrapes products with information about price etc using Selenium. Than I add every record to pandas DF, do some additional data manipulation and than store data in csv and upload to google drive. This runs every night
#Question itself
I would like to watch price changes, new products etc. Would you recommend, how to store data with date key, so there is option to flag new products etc? My idea is to store every load in one csv and add one column with "date_of_load"... But this seems noob_like... Maybe store data in PostrgreDB? I would like to start learning SQL, so I would try making my own DB.
Thanks for your ideas
Upvotes: 0
Views: 400
Reputation: 106
That is cool! I would suggest sqlite3 (https://docs.python.org/3/library/sqlite3.html) just to get a feeling with SQL. As you can see, it says "It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle", which is sort of what you suggested(?), so it could be a nice place to start.
However, CSV might do just fine. As long as there is not too much data (it takes forever to load(and process) all your necessary data), it doesn't matter much how you store it as long as you manage to apply it as you desire.
Upvotes: 2
Reputation: 2804
As for me better to use NoSQL (Mongo)
for this task. You can create JSON (data of prices) with keys are date.
This can help you:
Upvotes: 3