user289944
user289944

Reputation: 143

Accurate AWK array searching

Can anybody offer some help getting this AWK to search correctly?

I need to search inside the "sample.txt" file for all the 6 array elements in the "combinations" file. However, I need the search to happen from every single character instead of like an ordinary text editor search box type search, which searches by blocks after each occurrence. I need to search in the most squeezed in way so as to display exactly every times it happens. For example I need the type of search that finds inside the string "AAAAA" the combination "AAA" happening 3 times, not 1 time. See my previous post about this: BASH: Search a string and exactly display the exact number of times a substring happens inside it

The sample.txt file is:

AAAAAHHHAAHH

The combinations file is:

AA  
HH  
AAA  
HHH  
AAH  
HHA  

How do I get the script

#!/bin/bash
awk 'NR==FNR {data=$0; next} {printf "%s %d \n",$1,gsub($1,$1,data)}' 'sample.txt' combinations > searchoutput

to output the desired output:

AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1 

instead of what it is currently outputing:

AA 3 
HH 2 
AAA 1 
HHH 1 
AAH 2 
HHA 1 

?

As we can see, the script is only finding the combinations just like a text editor. I need it to search for the combinations from the start of every character instead so that the desired output happens.

How do I have the AWK output the desired output instead? Can't thank you enough.

Upvotes: 0

Views: 170

Answers (2)

jxc
jxc

Reputation: 13998

you might try this:

$ awk '{x="AAAAAHHHAAHH"; n=0}{
    while(t=index(x,$0)){n++; x=substr(x,t+1) } 
    print $0,n
}' combinations.txt 
AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1

Upvotes: 1

karakfa
karakfa

Reputation: 67517

there may be a faster way to find the first match and carry forward from that index, but this might be simpler

$ awk 'NR==1{content=$0;next} 
            {c=0; len1=length($1); 
             for(i=1;i<=length(content)-len1+1;i++)
                c+=substr(content,i,len1)==$1;
             print $1,c}' file combs

AA 5
HH 3
AAA 3
HHH 1
AAH 2
HHA 1

Upvotes: 1

Related Questions