Reputation: 63
There is a file that has multiple headers inside it, but to me, it only matters one and the data after it. This header repeats itself multiple times through the file.
Its magic number is: A3046 in ASCII, or 0x65 0x51 0x48 0x54 0x52
in HEX.
After finding the first byte, the parser has to take all bytes until 0xff
and then repeat for the remainder headers until the EOF.
First I loaded the file:
let mut file = OpenOptions::new()
.read(true)
.open("../assets/sample")
.unwrap();
let mut full_file: Vec<u8> = Vec::new();
file.read_to_end(&mut full_file);
I declare the magic numbers with: pub static QT_MAGIC: &[u8; 5] = b"A3046";
And as a test, I wrote the following function just to try if it could find the first header.
fn parse_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
tag(QT_MAGIC)(input)
}
However when the test runs, Ok has None
value. It definitely should have found something. What I am doing wrong?
I found no examples of bytes parsing using nom5, and also being a rust newbie is not helping. How can I parse all the blocks with these rules?
Upvotes: 4
Views: 3383
Reputation: 19672
nom
versionFirst off, apologies for this one, the playground only has nom 4.0 and as a result, the code is on this github repository.
To parse something like this, we're going to need to combine two different parser:
take_until
, to take bytes until either the preamble or EOFtag
, to isolate the preambleAnd a combinator, preceded
, so we can ditch the first element of a sequence of parsers.
// Our preamble
const MAGIC:&[u8] = &[0x65, 0x51, 0x48, 0x54, 0x52];
// Our EOF byte sequence
const EOF:&[u8] = &[0xff];
// Shorthand to catch EOF
fn match_to_eof(data: &[u8]) -> nom::IResult<&[u8], &[u8]> {
nom::bytes::complete::take_until(EOF)(data)
}
// Shorthand to catch the preamble
fn take_until_preamble(data: &[u8]) -> nom::IResult<&[u8], &[u8]> {
nom::bytes::complete::take_until(MAGIC)(data)
}
pub fn extract_from_data(data: &[u8]) -> Option<(&[u8], &[u8])> {
let preamble_parser = nom::sequence::preceded(
// Ditch anything before the preamble
take_until_preamble,
nom::sequence::preceded(
// Ditch the preamble
nom::bytes::complete::tag(MAGIC),
// And take until the EOF (0xff)
match_to_eof
)
);
// And we swap the elements because it's confusing AF
// as a return function
preamble_parser(data).ok().map(|r| {
(r.1, r.0)
})
}
The code should be annotated well enough to follow. This ditches any bytes until it finds the preamble bytes, then ditches those and keeps everything until it finds the EOF byte sequence ([0xff]
).
It then returns a reversed nom
result, because it was an example. You can un-reverse it to combine it with other parsers if you like. The first element is the content of the sequence, the second is whatever was after the EOF. This means that you can iterate with this function (I did that in a test in the repo I put on github).
Upvotes: 8