user3097691
user3097691

Reputation: 3

Shell script to print lines which match a pattern occurring on consecutive lines

I have huge number of text files with huge size, and using shell script, want to search each file to identify a string such as "&abcdef" (which indicates end of each record), and print only if it occurs in consecutive lines.

Input file contents file-a, an example for one of the files; there are other similar but huge files:

1239560059   TAB001   
8E12222439   TAB001   
84dswe6059   &abcdef
8229559179   &abcdef
8012156059   TAB001  
804E122224   TAB001  
8046317400 20120629 233000  20120629 
8046005912   TAB001   
8046559179 23222333   &abcdef
80463174E9   TAB001    
8024360099   TAB001  
8046316343   955912   &abcdef
8439559149   &abcdef
8044360059   TAB001    
8046360059   TAB001    
8034395879   &abcdef

Output required:

Upvotes: 0

Views: 779

Answers (1)

alvits
alvits

Reputation: 6768

You can use awk to keep track of previous and current occurence and if they are next to each other then print both lines.

awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print line;print $0}; prev=NR; line=$0}}' file-a

BUGS: There is one. If there are more than a pair of occurrence, say for example lines 11, 12, 13 have occurrences, the code will print lines 11, 12, 12, 13. Otherwise it will print the pairs fine.

If you expect to have more than 2 consecutive lines of &abcdef occuring in the file, let me know and I'll modify this code to suite it.

EDIT: forgot to include the filename in the code.

EDIT: This is very rudimentary and definitely can be improved. Here's the modified code.

awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0; print FILENAME} else {if((prev+1)==NR) {print NR-1 ":" line;print NR ":" $0}; prev=NR; line=$0}}' file-a

EDIT: If you want the filename pre-pended just like the line number then your code should be.

awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print FILENAME ":" NR-1 ":" line;print FILENAME ":" NR ":" $0}; prev=NR; line=$0}}' file-a

EDIT: If you only need to print the filename and line number but not the lines themselves, then your code should be:

awk 'BEGIN {prev=0} /&abcdef/ {if(prev==0) {prev=NR;line=$0} else {if((prev+1)==NR) {print FILENAME ":" NR-1;print FILENAME ":" NR}; prev=NR; line=$0}}' file-a

Upvotes: 2

Related Questions