d-_-b
d-_-b

Reputation: 23161

node pass in strings for grep -f --fixed-strings

I'd like to use grep --count --fixed-strings needles.txt < haystack.txt from a Node.js environment.

Instead of having a file for needles.txt, I have an array of strings to search, and instead of haystack.txt I have a large string/buffer of text.

What's the best combination of child_process methods to use?

Something like:

import {spawn} from "child_process";

// haystack to search within
const haystack = "I am \n such a big string, do you\n see me?";
const readable = new Readable();
readable.push(haystack);
readable.push(null);

// the list of needles that would normally go in `--file=needles.txt`
const needles = ["find", "me", "or", "me"];

// spawn `fgrep`
// Q: How do I pass in `needles` as a string?
const fgrep = spawn(`fgrep`, [needles])

// pipe my haystack to fgrep
readable.pipe(fgrep.stdin);

grep documentation

Upvotes: 0

Views: 306

Answers (1)

Matt
Matt

Reputation: 74670

For the grep args, -e lets you specify multiple patterns:

grep -e 1 -e 2

The JS for generating the args will be something like:

const needles = ["find", "me", "or", "me"];
const grep_pattern_args = needles.reduce((res, pattern) => {
    res.push('-e', pattern)
    return res
}, [])
const grep_args = [ '--count', '--fixed-strings', ...grep_pattern_args ]

3000 needles is heading into the territory of hitting execves length limit of MAX_ARG_STRLEN in Linux of 128kiB. If you have lengthy needles you may need to write them to a file in any case to be safe.

spawn is good as you get the writable stream back for stdin that you can write to as haystack is read/generated (assuming your Readable stream example setup is contrived)

const stdout = []
const stderr = []
const fgrep = spawn('/usr/bin/fgrep', grep_args, { stdio: ['pipe', 'pipe', 'pipe'] })
fgrep.on('error', console.error)

// For larger output you can process more on the stream. 
fgrep.stdout.on('data', chunk => stdout.push(chunk))
fgrep.stderr.on('data', chunk => {
  process.stderr.write(chunk)
  stderr.push(chunk)
})

fgrep.on('close', (code) => {
  if (code !== 0) console.error(`grep process exited with code ${code}`)
  stdout.map(chunk => process.stdout.write(chunk))
})

fgrep.stdin.pipe(haystream)

Upvotes: 1

Related Questions