Reputation: 43183
grep is great at finding lines that match a pattern. But what if you have a file with a single extremely long line (say a 100MB file), and you want to find chunks within it that match a pattern?
For each match, you'd want to print the character offset, and the matched string, with extra characters on either side for context.
In Python, you could write something like this (would need boundary checks):
[(m.start(), s[m.start()-50:m.end()+50]) for m in re.finditer(regex, s)]
But is there some way to do the equivalent using standard linux command line tools?
Upvotes: 2
Views: 315
Reputation: 382612
How to truncate long matching lines returned by grep or ack gives a good one supposing the line fits into memory:
grep -Eo '.{0,15}needle.{0,15}' longlines.txt
If the lines don't necessarily into memory, then have a look at bgrep
: https://unix.stackexchange.com/questions/223078/best-way-to-grep-a-big-binary-file/758528#758528
bgrep `printf %s needle | od -t x1 -An -v | tr -d '\n '` myfile.bin
Upvotes: 2
Reputation: 50750
For each match, you'd want to print the offset, and the matched string, with extra characters on either side for context.
You can do that with awk like this:
awk '{
i = 1
while (match(substr($0, i), /regex/)) {
off = i + RSTART - 1
print off, substr($0, off > 50 ? off - 50 : 1, RLENGTH + 100)
i = off + RLENGTH
}
}' file
Upvotes: 3