Reputation: 448
I am trying to deserialize a file containing structured data with C# to export to CSV and insert into an MSSQL Database in the end.
The file looks a bit like JSON but only the individual lines are valid JSON.
Example:
{"field1": "value1", "field2": "value2", "field3": "value3"}
{"field1": "value4", "field2": "value5", "field3": "value6"}
{"field1": "value7", "field2": "value8", "field3": "value9"}
I tried to use Newtonsoft.Json looping over the individual lines but the execution takes a very long time this way as the file given to me is very large (multiple million lines).
Alternativley I tried making this entire thing valid json altering the string using a StringBuilder but that results in loading this entire file into RAM at once which does not seem to be a reasonable option at all.
I even thought about splitting the file into chucks which are then processed concurrently but I assume there must be a cleaner option to do this?
Is this a format that is in some way standardized? What would be the smartest way to go about importing this? Should I skip the CSV and use C# to directly insert the data into the datebase?
Any help appreciated!
Upvotes: 0
Views: 58
Reputation: 31
Given the size of the file, I'd be tempted to create a custom stream (see Implement custom stream) that wraps the real file stream such that the first few bytes returned were "{items: [", after that the bytes for the real stream and at the end "]}". That would mean you could use your Newtonsoft.JSON library without trying to build the entire corrected JSON in memory.
Upvotes: 2