Reputation: 3874
I have a close to 800 MB file which consists of several (header followed by content).
Header looks something like this M=013;X=rast;645.jpg
while content is binary of the jpg file.
So the file looks something like this
M=013;X=rast;645.jpgNULœDüŠˆ.....M=217;X=rast;113.jpgNULÿñÿÿ&åbÿås....M=217;X=rast;1108.jpgNUL]_ÿ×ÉcË/...
The header can occur in one line or across two lines.
I need to parse this file and basically pop out the several jpg images.
Since this is too big a file, please suggest an efficient way? I was hoping to use StreamReader but do not have much experience with regular expressions to use with it.
Upvotes: 0
Views: 172
Reputation: 10169
RegEx:
/(M=.+?;X=.+?;.+?\.jpg)(.+?(?=(?1)|$))/gs
*with recursion (not supported in .NET)
.NET RegEx workaround:
/(M=.+?;X=.+?;.+?\.jpg)(.+?(?=M=.+?;X=.+?;.+?\.jpg|$))/gs
replaced the (?1)
recursion group with the contents inside the 1st capture group
Live demo and Explanation of RegExp: http://regex101.com/r/nQ3pE0/1
You'll want to use the 2nd capture group for binary contents, the 1st group will match the header and the expression needs it to know where to stop.
*edited in italic
Upvotes: 1