Reputation: 105
I have a large json file from a web scraping project I've been doing for a while. Now I'm trying to build a web frontend using the JSON data. I'm having a hard time figuring out the best way to go about building it, though.
The json file looks like this:
{
"_id" : { "$oid" : "55d5c85a96cc6212bdd4ca08" },
"name" : "Example",
"url" : "http://example.com/blahblah",
"ts" : { "$date" : 1073423706824 }
}
I have a couple questions:
The json file would be added to overtime, so would the best solution be to regularly add to a database, or just keep the json file in the cloud somewhere and pull from it when needed?
If I put it in a database, how could I regularly add it to a database, without slowing down the front end of the site? I know I could use something like json_decode
, but I've mostly only seen examples with a few lines of json, could it be used for larger json files?
If I put it in a database, would a relational db be faster/more efficient or something like mongodb?
Upvotes: 1
Views: 782
Reputation: 15058
After doing a lot of webscraping myself here's what I would recommend:
Decide between your relational and non, relational database. If your data is constantly changing with an unknown number of parameters I recommend using MongoDB (as it's almost JSON and is fully schemaless so easy to add new facets). If your data is all the same format then using a relational DB is a good step forward. PostgreSQL and MariaDB are good, open source options.
Convert your current JSON data into the DB format chosen and insert it.
Start scraping straight to the DB, try not to use JSON files any more.
Read from the database for your front end. If you're choosing Python, you could look at flask as a good option.
There is also a really interesting question on Store static data in an array or in a database previously posted with some in depth answers as to static files vs. Database.
If you take static files out of the equation and use databases here are the answers to your 3 questions;
Just use the database.
Adding to the database is simple. Once you've got it set up, your scraper can write straight to it with the relevant driver. Again, no need for JSON files.
It all depends on your data
Upvotes: 2