James Starlight
James Starlight

Reputation: 523

awk: search patterns in the multi-string file

My awk script search the log file fro two search patterns and then print strings where it found into another file.

awk -F# -v pat1="$search_pattern1" -v pat2="$search_pattern2" '{ for (i = 1; i <= NF; i++) {if (match($i, "^1\\.[0-9]+\\/\\A "pat1)) {sub(/^1\./, "", $i); sub(/\/.*/, "", $i); if (first == "") first = $i; if ($i in b) {first = $i; exit} a[$i]} else if (match($i, "^1\\.[0-9]+\\/\\A "pat2)) {sub(/^1\./, "", $i); sub(/\/.*/, "", $i); if ($i in a) {first = $i; exit} b[$i] }}} END {if (first == "") print "1"; else print first}' search_file.log

I have a question related to :

{if (match($i, "^1\\.[0-9]+\\/\\A "pat1))

Presently it find the pat1 just near the "/A" for example in the string like

06I_nsp5holoHIE_pp2.pdb #1.1/A pat1 NE2 

How could I modify the regex to be able to find pat 1 either near /A or near /?, so to be able to identify it additionally in the string like:

06I_nsp5holoHIE_pp2.pdb #1.1/? pat1 NE2 

Upvotes: 0

Views: 46

Answers (2)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2805

mawk 'sub("$","\f" index($_,__)) < NF' \
                                        \
  FS='^[^#]*.1[.][0-9]+[/][?A][ ]+|[ ]+' __='pat1'
06I_nsp5holoHIE_pp2.pdb #1.1/? pat1 NE2 
                                        32

to cross-validate that 32….

gcut -c 32- <<< '06I_nsp5holoHIE_pp2.pdb #1.1/? pat1 NE2 '   
pat1 NE2 

Upvotes: 0

Daweo
Daweo

Reputation: 36370

How could I modify the regex to be able to find pat 1 either near /A or near /?

I would use [ and ] with enumerated acceptable characters inside it. Consider simplified example, let file.txt content be

1.1/A pat1 NE2 
1.1/? pat1 NE2 
1.1/Z pat1 NE2

then

awk 'match($0,"1\\.[0-9]+\\/[A?]"){print NR, RSTART, RLENGTH}' file.txt

gives output

1 1 5
2 1 5

Explanation: I do print number of row, position of start of match, length of match, if match was found. Observe that ? means literal ? inside [ and ]. You might elect to use | (alternative) rather than [ and ] but in such case you must escape ?.

(tested in gawk 4.2.1)

Upvotes: 2

Related Questions