Reputation: 3
I have huge number of text files with huge size, and using shell script, want to search each file to identify a string such as "&abcdef" (which indicates end of each record), and print only if it occurs in consecutive lines.
Input file contents file-a
, an example for one of the files; there are other similar but huge files:
1239560059 TAB001
8E12222439 TAB001
84dswe6059 &abcdef
8229559179 &abcdef
8012156059 TAB001
804E122224 TAB001
8046317400 20120629 233000 20120629
8046005912 TAB001
8046559179 23222333 &abcdef
80463174E9 TAB001
8024360099 TAB001
8046316343 955912 &abcdef
8439559149 &abcdef
8044360059 TAB001
8046360059 TAB001
8034395879 &abcdef
Output required:
file-a
has multiple occurrences of &abcdef
in consecutive linesfile-a
has multiple occurrences of &abcdef
in consecutive linesUpvotes: 0
Views: 779
Reputation: 6768
You can use awk to keep track of previous and current occurence and if they are next to each other then print both lines.
awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print line;print $0}; prev=NR; line=$0}}' file-a
BUGS: There is one. If there are more than a pair of occurrence, say for example lines 11, 12, 13 have occurrences, the code will print lines 11, 12, 12, 13. Otherwise it will print the pairs fine.
If you expect to have more than 2 consecutive lines of &abcdef occuring in the file, let me know and I'll modify this code to suite it.
EDIT: forgot to include the filename in the code.
EDIT: This is very rudimentary and definitely can be improved. Here's the modified code.
awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0; print FILENAME} else {if((prev+1)==NR) {print NR-1 ":" line;print NR ":" $0}; prev=NR; line=$0}}' file-a
EDIT: If you want the filename pre-pended just like the line number then your code should be.
awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print FILENAME ":" NR-1 ":" line;print FILENAME ":" NR ":" $0}; prev=NR; line=$0}}' file-a
EDIT: If you only need to print the filename and line number but not the lines themselves, then your code should be:
awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print FILENAME ":" NR-1;print FILENAME ":" NR}; prev=NR; line=$0}}' file-a
Upvotes: 2