zenCoder
zenCoder

Reputation: 740

How to store billions of JSON files and query them

I currently have an API which accepts JSON files(which are JSON serialised objects which contains some user transaction data) and stores the same into the server. Every such JSON file has a unique global id and a unique user to which it is associated. There are billions of such files generated every day. The user should then be able to query through all JSON files that are associated to him and produce a bunch of aggregated results calculated on top of those files.

A typical JSON file that needs to be stored looks something like:

[ { "currencyCode" : "INR",
"receiptNumber" : { "value" : "1E466GDX5X2C" },
"retailTransaction" : [ { "grandTotal" : 90000.0,
      "lineItem" : [ { "otherAttributes" : {  },
            "sale" : { "description" : "Samsung galaxy S3",
                "discountAmount" : { "currency" : "INR",
                    "value" : 2500
                  },
                "itemSubType" : "SmartPhone",
                "otherAttributes" : {  },
                "unitCostPrice" : { "quantity" : 1,
                    "value" : 35000
                  }
              },
            "sequenceNumber" : 1000
          },
          { "customerOrderForPickup" : { "description" : "iPhone5",
                "discountAmount" : { "currency" : "INR",
                    "value" : 5000
                  },
                "itemSubType" : "SmartPhone",
                "otherAttributes" : {  },
                "unitCostPrice" : { "quantity" : 1,
                    "value" : 55000
                  }
              },
            "otherAttributes" : {  },
            "sequenceNumber" : 1000
          }
        ],
      "otherAttributes" : {  },
      "reason" : "Delivery",
      "total" : [ { "otherAttributes" : {  },
            "type" : "TransactionGrossAmount",
            "value" : 35000
          } ]
    },
    null
  ],
"sequenceNumber" : 125435,
"vatRegistrationNumber" : "10868758650"
} ]

The above JSON is the serialised version of a complex object containing single or array of Objects of other classes as attributes. So the 'receiptNumber' is the universal id of the JSON file.

I would need to query stuff like quantity and value of the customerOrderForPickup or the grandTotal of the transaction, and in as an aggegate of various such transaction JSONs **

I would like to have some suggestion as to how to go about: 1) Storing these JSON files on the server, the file system ie 2) What kind of a database should I use to query through these JSON files with such a complex structure

My research has resulted in a couple of possibilities: 1) Use a MongoDB database to store the JSON representatives of the object and query through the database. How would the JSON files be stored? What will be the best way to store the transaction JSONs in the MongoDB database? 2) Couple a SQL database containing the unique global id, user id and the address of the JSON file on the server, with an aggregating code on those files. I doubt if this can be scaled

Would be glad if someone has any insights on the problem. Thanks.

Upvotes: 1

Views: 3808

Answers (1)

Dennis Puzak
Dennis Puzak

Reputation: 3736

I would say your question is very general and really a matter of style and preferences. You could do this in 10 different ways and every one would be perfectly good.

I'm gonna give my personal preference and how I would do it:

Since there is a lot of data, I would use a Relational database - SQL Server. Since I like Microsoft tools and ASP MVC (I know there is a lot of people who don't, but its my preference) and it has a serializer which can turn JSON into c# objects. Since I also like to use entity framework, and entity framework can translate c# objects into database stuff I would just structure a database the same way my JSON object looks. I would then have an api that would accept those JSON entities, ASP MVC would automaticly turn them into c# objects and entity framework would automaticly turn them into database rows. This way the whole upload API woudnt take more than a few lines of code to make.

I would then make more API methods for different types of querying the data. Linq and entity framework make the different queries easy as one line of code sometimes.

Upvotes: 3

Related Questions