Reputation: 1505
I'm trying to parse a crg-file in C#. The file is mixed with plain text and binary data. The first section of the file contains plain text while the rest of the file is binary (lots of floats), here's an example:
$
$ROAD_CRG
reference_line_start_u = 100
reference_line_end_u = 120
$
$KD_DEFINITION
#:KRBI
U:reference line u,m,730.000,0.010
D:reference line phi,rad
D:long section 1,m
D:long section 2,m
D:long section 3,m
...
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
�@z����RA����\�l
...
I know I can read bytes starting at a specific offset but how do I find out which byte to start from? The last row before the binary section will always contain at least four dollar signs "$$$$". Here's what I've got so far:
using var fs = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read);
var startByte = ??; // How to find out where to start?
using (BinaryReader reader = new BinaryReader(fs))
{
reader.BaseStream.Seek(startByte, SeekOrigin.Begin);
var f = reader.ReadSingle();
Debug.WriteLine(f);
}
Upvotes: 3
Views: 1992
Reputation: 1505
Solution
I know this is far from the most optimized solution but in my case it did the trick and since the plain text section of the file was known to be fairly small this didn't cause any noticable performance issues. Here's the code:
using var fileStream = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read);
using var reader = new BinaryReader(fileStream);
var newLine = '\n';
var markerString = "$$$$";
var currentString = "";
var foundMarker = false;
var foundNewLine = false;
while (!foundNewLine)
{
var c = reader.ReadChar();
if (!foundMarker)
{
currentString += c;
if (currentString.Length > markerString.Length)
currentString = currentString.Substring(1);
if (currentString == markerString)
foundMarker = true;
}
else
{
if (c == newLine)
foundNewLine = true;
}
}
if (foundNewLine)
{
// Read binary
}
Note: If you're dealing with larger or more complex files you should probably take a look at Mark Gravell's answer and the comment sections.
Upvotes: 0
Reputation: 567
UPDATE: This code may not work as expected. Please review the valuable information in the comments.
using (var fs = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs, Encoding.ASCII, true, 1, true))
{
var line = sr.ReadLine();
while (!string.IsNullOrWhiteSpace(line) && !line.Contains("$$$$"))
{
line = sr.ReadLine();
}
}
using (BinaryReader reader = new BinaryReader(fs))
{
// TODO: Start reading the binary data
}
}
Upvotes: 0
Reputation: 1062492
When you have a mixture of text data and binary data, you need to treat everything as binary. This means you should be using raw Stream
access, or something similar, and using binary APIs to look through the text data (often looking for cr/lf/crlf at bytes as sentinels, although it sounds like in your case you could just look for the $$$$
using binary APIs, then decode the entire block before, and scan forwards). When you think you have an entire line, then you can use Encoding
to parse each line - the most convenient API being encoding.GetString().
When you've finished looking through the text data as binary, then you can continue parsing the binary data, again using the binary API. I would usually recommend against BinaryReader
here too, because frankly it doesn't gain you much over more direct API. The other problem you might want to think about is CPU endianness, but assuming that isn't a problem: BitConverter.ToSingle()
may be your friend.
If the data is modest in size, you may find it easiest to use byte[]
for the data; either via File.ReadAllBytes
, or by renting an oversized byte[]
from the array-pool, and loading it from a FileStream
. The Stream
API is awkward for this kind of scenario, because once you've looked at data: it has gone - so you need to maintain your own back-buffers. The pipelines API is ideal for this, when dealing with large data, but is an advanced topic.
Upvotes: 3