awk: search patterns in the multi-string file

Question

My awk script search the log file fro two search patterns and then print strings where it found into another file.

awk -F# -v pat1="$search_pattern1" -v pat2="$search_pattern2" '{ for (i = 1; i <= NF; i++) {if (match($i, "^1\.[0-9]+\/\A "pat1)) {sub(/^1\./, "", $i); sub(//.*/, "", $i); if (first == "") first = $i; if ($i in b) {first = $i; exit} a[$i]} else if (match($i, "^1\.[0-9]+\/\A "pat2)) {sub(/^1\./, "", $i); sub(//.*/, "", $i); if ($i in a) {first = $i; exit} b[$i] }}} END {if (first == "") print "1"; else print first}' search_file.log

I have a question related to :

{if (match($i, "^1\.[0-9]+\/\A "pat1))

Presently it find the pat1 just near the "/A" for example in the string like

06I_nsp5holoHIE_pp2.pdb #1.1/A pat1 NE2

How could I modify the regex to be able to find pat 1 either near /A or near /?, so to be able to identify it additionally in the string like:

06I_nsp5holoHIE_pp2.pdb #1.1/? pat1 NE2

Daweo · Accepted Answer

How could I modify the regex to be able to find pat 1 either near /A or near /?

I would use [ and ] with enumerated acceptable characters inside it. Consider simplified example, let file.txt content be

1.1/A pat1 NE2 
1.1/? pat1 NE2 
1.1/Z pat1 NE2

then

awk 'match($0,"1\.[0-9]+\/[A?]"){print NR, RSTART, RLENGTH}' file.txt

gives output

1 1 5
2 1 5

Explanation: I do print number of row, position of start of match, length of match, if match was found. Observe that ? means literal ? inside [ and ]. You might elect to use | (alternative) rather than [ and ] but in such case you must escape ?.

(tested in gawk 4.2.1)

awk: search patterns in the multi-string file

Answers (2)

Related Questions