How to do a grep-like search on a very long line?

Question

grep is great at finding lines that match a pattern. But what if you have a file with a single extremely long line (say a 100MB file), and you want to find chunks within it that match a pattern?

For each match, you'd want to print the character offset, and the matched string, with extra characters on either side for context.

In Python, you could write something like this (would need boundary checks):

[(m.start(), s[m.start()-50:m.end()+50]) for m in re.finditer(regex, s)]

But is there some way to do the equivalent using standard linux command line tools?

oguz ismail · Accepted Answer

For each match, you'd want to print the offset, and the matched string, with extra characters on either side for context.

You can do that with awk like this:

awk '{
  i = 1
  while (match(substr($0, i), /regex/)) {
    off = i + RSTART - 1
    print off, substr($0, off > 50 ? off - 50 : 1, RLENGTH + 100)
    i = off + RLENGTH
  }
}' file

How to do a grep-like search on a very long line?

Answers (2)

Related Questions