Reputation: 45
I am trying to make a custom mutator for regular expressions for software fuzzing purposes. I want to create a program which takes in a (valid) regular expression through stdin, parses this regular expression into an AST, then mutates the AST in random ways and then converts this mutated AST back into a regular expression.
I asked ChatGPT for some code and this is what it spit out:
use arbitrary::{Arbitrary, Unstructured};
use regex_syntax::ast::{parse::Parser, Ast};
use std::io::{self, Read};
fn main() {
// 1️⃣ Read regex string from stdin
let mut input = String::new();
io::stdin().read_to_string(&mut input).expect("Failed to read input");
// Trim input (to remove newlines from `echo`)
let input = input.trim();
// 2️⃣ Parse regex into an AST
let parser = Parser::new();
let ast = match parser.parse(input) {
Ok(ast) => ast,
Err(err) => {
eprintln!("Failed to parse regex: {:?}", err);
return;
}
};
// 3️⃣ Serialize AST into bytes
let mut ast_bytes = bincode::serialize(&ast).expect("Failed to serialize AST");
// 4️⃣ Mutate the serialized bytes
mutate_bytes(&mut ast_bytes);
// 5️⃣ Deserialize the mutated bytes back into an AST
let mutated_ast: Ast = match bincode::deserialize(&ast_bytes) {
Ok(ast) => ast,
Err(err) => {
eprintln!("Failed to deserialize mutated AST: {:?}", err);
return;
}
};
// 6️⃣ Convert mutated AST back into a regex string
let mutated_regex = format!("{}", mutated_ast);
// 7️⃣ Print mutated regex
println!("Mutated Regex: {}", mutated_regex);
}
/// Simple byte mutation function
fn mutate_bytes(bytes: &mut Vec<u8>) {
if bytes.is_empty() {
return;
}
let mut rng = rand::thread_rng();
let index = rng.gen_range(0..bytes.len());
bytes[index] ^= 0xFF; // Flip bits at a random position
}
and here is my Cargo.toml
:
[package]
name = "rust_random_valid_regex"
version = "0.1.0"
edition = "2024"
[dependencies]
rand = "0.8"
arbitrary = { version = "1.4.1", features = ["derive"] }
libfuzzer-sys = { version = "0.4.1", features = ["arbitrary-derive"] }
bincode = "2.0.0-rc.3"
regex-syntax = { version = "0.8", features = ["arbitrary"] }
serde = { version = "1.0", features = ["derive"] }
Upon compiling I get these errors:
Compiling rust_random_valid_regex v0.1.0 (/home/oof/ruby/regexfuzzer/custom_mut/rust_random_valid_regex)
error[E0425]: cannot find function `serialize` in crate `bincode`
--> src/main.rs:52:34
|
52 | let mut ast_bytes = bincode::serialize(&ast).expect("Failed to serialize AST");
| ^^^^^^^^^ not found in `bincode`
error[E0425]: cannot find function `deserialize` in crate `bincode`
--> src/main.rs:58:43
|
58 | let mutated_ast: Ast = match bincode::deserialize(&ast_bytes) {
| ^^^^^^^^^^^ not found in `bincode`
warning: unused imports: `Arbitrary` and `Unstructured`
--> src/main.rs:29:17
|
29 | use arbitrary::{Arbitrary, Unstructured};
| ^^^^^^^^^ ^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
error[E0599]: no method named `gen_range` found for struct `ThreadRng` in the current scope
--> src/main.rs:79:21
|
79 | let index = rng.gen_range(0..bytes.len());
| ^^^^^^^^^
|
::: /home/oof/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rand-0.8.5/src/rng.rs:129:8
|
129 | fn gen_range<T, R>(&mut self, range: R) -> T
| --------- the method is available for `ThreadRng` here
|
= help: items from traits can only be used if the trait is in scope
help: there is a method `gen_ratio` with a similar name, but with different arguments
--> /home/oof/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rand-0.8.5/src/rng.rs:299:5
|
299 | fn gen_ratio(&mut self, numerator: u32, denominator: u32) -> bool {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `Rng` which provides `gen_range` is implemented but not in scope; perhaps you want to import it
|
29 + use rand::Rng;
|
Some errors have detailed explanations: E0425, E0599.
For more information about an error, try `rustc --explain E0425`.
warning: `rust_random_valid_regex` (bin "rust_random_valid_regex") generated 1 warning
error: could not compile `rust_random_valid_regex` (bin "rust_random_valid_regex") due to 3 previous errors; 1 warning emitted
implying that bincode
doesn't know how to serialize/deserialize a regex Ast. I tried looking through the documentation , but couldn't find anything referring to serializing an AST into bytes. I think there is a way to do this, since the fuzzer for the regex crate serializes bytes into an Ast
using Arbitrary
. Therefore it makes sense that there would be a function in reverse (to convert an Ast
to raw bytes which can then be mutated).
You can take a look at the complete source code here on my github.
How do I modify the AST properly?
Thanks in advance!
Edit: My point is that since there is a way which the other fuzzer uses to convert bytes to a regex AST, there should probably be a way to do the reverse, that is to convert and AST into a series of bytes, but I couldn't find such a function anywhere. Does such functionality exist?
Upvotes: -2
Views: 42