How to modify a rust regex AST randomly?

I am trying to make a custom mutator for regular expressions for software fuzzing purposes. I want to create a program which takes in a (valid) regular expression through stdin, parses this regular expression into an AST, then mutates the AST in random ways and then converts this mutated AST back into a regular expression.

I asked ChatGPT for some code and this is what it spit out:

use arbitrary::{Arbitrary, Unstructured};
use regex_syntax::ast::{parse::Parser, Ast};
use std::io::{self, Read};

fn main() {
    // 1️⃣ Read regex string from stdin
    let mut input = String::new();
    io::stdin().read_to_string(&mut input).expect("Failed to read input");

    // Trim input (to remove newlines from `echo`)
    let input = input.trim();

    // 2️⃣ Parse regex into an AST
    let parser = Parser::new();
    let ast = match parser.parse(input) {
        Ok(ast) => ast,
        Err(err) => {
            eprintln!("Failed to parse regex: {:?}", err);
            return;
        }
    };

    // 3️⃣ Serialize AST into bytes
    let mut ast_bytes = bincode::serialize(&ast).expect("Failed to serialize AST");

    // 4️⃣ Mutate the serialized bytes
    mutate_bytes(&mut ast_bytes);

    // 5️⃣ Deserialize the mutated bytes back into an AST
    let mutated_ast: Ast = match bincode::deserialize(&ast_bytes) {
        Ok(ast) => ast,
        Err(err) => {
            eprintln!("Failed to deserialize mutated AST: {:?}", err);
            return;
        }
    };

    // 6️⃣ Convert mutated AST back into a regex string
    let mutated_regex = format!("{}", mutated_ast);

    // 7️⃣ Print mutated regex
    println!("Mutated Regex: {}", mutated_regex);
}

/// Simple byte mutation function
fn mutate_bytes(bytes: &mut Vec<u8>) {
    if bytes.is_empty() {
        return;
    }
    let mut rng = rand::thread_rng();
    let index = rng.gen_range(0..bytes.len());
    bytes[index] ^= 0xFF; // Flip bits at a random position
}

and here is my Cargo.toml:

[package]
name = "rust_random_valid_regex"
version = "0.1.0"
edition = "2024"

[dependencies]
rand = "0.8"
arbitrary = { version = "1.4.1", features = ["derive"] }
libfuzzer-sys = { version = "0.4.1", features = ["arbitrary-derive"] }
bincode = "2.0.0-rc.3"
regex-syntax = { version = "0.8", features = ["arbitrary"] }
serde = { version = "1.0", features = ["derive"] }

Upon compiling I get these errors:

   Compiling rust_random_valid_regex v0.1.0 (/home/oof/ruby/regexfuzzer/custom_mut/rust_random_valid_regex)
error[E0425]: cannot find function `serialize` in crate `bincode`
  --> src/main.rs:52:34
   |
52 |     let mut ast_bytes = bincode::serialize(&ast).expect("Failed to serialize AST");
   |                                  ^^^^^^^^^ not found in `bincode`

error[E0425]: cannot find function `deserialize` in crate `bincode`
  --> src/main.rs:58:43
   |
58 |     let mutated_ast: Ast = match bincode::deserialize(&ast_bytes) {
   |                                           ^^^^^^^^^^^ not found in `bincode`

warning: unused imports: `Arbitrary` and `Unstructured`
  --> src/main.rs:29:17
   |
29 | use arbitrary::{Arbitrary, Unstructured};
   |                 ^^^^^^^^^  ^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error[E0599]: no method named `gen_range` found for struct `ThreadRng` in the current scope
   --> src/main.rs:79:21
    |
79  |     let index = rng.gen_range(0..bytes.len());
    |                     ^^^^^^^^^
    |
   ::: /home/oof/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rand-0.8.5/src/rng.rs:129:8
    |
129 |     fn gen_range<T, R>(&mut self, range: R) -> T
    |        --------- the method is available for `ThreadRng` here
    |
    = help: items from traits can only be used if the trait is in scope
help: there is a method `gen_ratio` with a similar name, but with different arguments
   --> /home/oof/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rand-0.8.5/src/rng.rs:299:5
    |
299 |     fn gen_ratio(&mut self, numerator: u32, denominator: u32) -> bool {
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: trait `Rng` which provides `gen_range` is implemented but not in scope; perhaps you want to import it
    |
29  + use rand::Rng;
    |

Some errors have detailed explanations: E0425, E0599.
For more information about an error, try `rustc --explain E0425`.
warning: `rust_random_valid_regex` (bin "rust_random_valid_regex") generated 1 warning
error: could not compile `rust_random_valid_regex` (bin "rust_random_valid_regex") due to 3 previous errors; 1 warning emitted

implying that bincode doesn't know how to serialize/deserialize a regex Ast. I tried looking through the documentation , but couldn't find anything referring to serializing an AST into bytes. I think there is a way to do this, since the fuzzer for the regex crate serializes bytes into an Ast using Arbitrary . Therefore it makes sense that there would be a function in reverse (to convert an Ast to raw bytes which can then be mutated).

You can take a look at the complete source code here on my github.

How do I modify the AST properly?

Thanks in advance!

Edit: My point is that since there is a way which the other fuzzer uses to convert bytes to a regex AST, there should probably be a way to do the reverse, that is to convert and AST into a series of bytes, but I couldn't find such a function anywhere. Does such functionality exist?

Upvotes: -2

Views: 42

Answers (0)

Related Questions