BabaG
BabaG

Reputation: 41

awk combine conditions and search to print?

There may be something on here like this but I haven't found it.

I have two conditions I'm trying to meet in order to print: one specific, one kind of open ended:

First,

$1 == "SCR"

Second, that the search string exists anywhere from $7 until the end of the line.

an example of the searchstring might be _RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf

So, in no particular syntax, I guess what I'm looking for is something like:

awk '$1 == "SCR" && _RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf exists anywhere in $7 to end of line {print $6}' inputfile.txt

Any help is welcomed. thanks.

Upvotes: 0

Views: 171

Answers (4)

Michael Back
Michael Back

Reputation: 1871

  • parameter used for search string
  • approach of blanking non-searchable records is used (most succinct & likely fastest)
  • index() used for exact match of string
awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
    $1 == "SCR" {
        f=$6
        $1=$2=$3=$4=$5=$6=""
        if (index($0, p))
            print f
    }
' inputfile.txt

As long as leading delimiters aren't part of the search string.... we should be good to go.

If we want this script to be formatted to fit on one single line:

awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '$1 == "SCR" { f=$6; $1=$2=$3=$4=$5=$6=""; if (index($0, p)) print f }' inputfile.txt

I decided to have a some fun with this little script today... Here is a side-effect filled function variant that minimizes the use of global variables. Suggest that the previous solution is much more straightforward (and the one we want), but the following code is both fun and educational:

awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
    function x(v,    f, i) {
        f=$6
        $1=$2=$3=$4=$5=$6=""
        i = index($0, v)
        $0=f
        return i
    }
    $1 == "SCR" && x(p)
' inputfile.txt
 

How to figure the above out:

  • In awk, we can pass fewer parameters than we specify in the function definition -- unused parameters in functions are basically temporary variables... Your friendly awk-ers use the convention of 4 space offset to identify the temporary variables.
  • We use essentially the same algorithm that we did before, but instead of using print f on success, we get cute and just put f back into the line buffer.
  • So, where is the print??? Well, the final line of the script is where we put our magic... If an action is not specified for a given pattern, it is replaced with the default action.... The default action of a pattern success is to print the line buffer. Success! :)
  • One last note is about order of execution of the last statement. awk uses short-circuit evaluation of &&; so, there is no possibility of our x(p) function firing off first and destroying $1 before evaluation of $1 == "SCR".

Our favored language awk is not a threaded language... So with a little trickery.. we can repair this function and it's implementation to be both efficient and "pure"...

awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
    function x(v,    b, i) {
        b=$0
        $1=$2=$3=$4=$5=$6=""
        i = index($0, v)
        $0=b
        return i
    }
    $1 == "SCR" && x(p) {
        print $6
    }
' inputfile.txt

Why is the above important? Well, we want our functions to not mutate state.... we want our functions to return without side effects... that way we can continue our execution without worrying our line buffer -- or anything else -- has changed state. Ya.... we have cheated in this last example in leveraging/mutating global state, and then repairing it later... but by the end we have repaired it; so, "no harm no foul," eh? So... in this new implementation, we could in fact reverse x(p) with $1 == "SCR" and have no problems (except it would be slower)... more complex follow-on programs though would benefit from making our function "pure" as this method would make it easier to think through.

Upvotes: 1

karakfa
karakfa

Reputation: 67507

awk -v s='_RS1c80952b46129f' '$1=="SCR"{for(i=7;i<=NF;i++) 
                                          if($i==s) {print $6; next}}' file

Note that this is looking for an exact string match, not a regex.

Upvotes: 0

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2801

gsub( ) returns # of instances it found (e.g. checking gsub(/[ ]+/, "&", $1) >= 6. And seeing that number of spaces in between fields isn't overly a concern here, and alternate approach can be like

{mawk/mawk2/gawk} -v VAL1='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' 'BEGIN { FS = VAL1 
         
         } ( NF <= 1 ) || ! /^[ \t]*SCR[ \t]/ { next

         } { FS = "[ \t]+"; 

                $0 = $1;  if (NF >= 6) { print $6 }; 
            
             FS = VAL1; }' 

Use the long string as the delimiter. In a default scenario, $1=="SCR" is the same as potential leading blanks (spaces or tabs) , "SCR", then blank again.

Pre-drop items that fail the up-front ones. The finally re-work the field seps to standard one and check whether you have at least 6 of those items.

3 more alternative approaches would be

(1) in final code chunk do array split instead of re-working the seps, or

(2) don't change FS, but after $0 = $1, create a repetitive regex that's 5x of this :

       ([^ ]+[ \t]+)

( gawk offers {5}, mawk not really so ) : sub( ) 1 time off the left. if you can still find a blank afterwards, then take everything prior to that. that's your $6.

(3) similar to alternative (2), but use a for-loop( ) for 5 times to sub them away off the left side.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203532

This might be what you're looking for:

awk '($1 == "SCR") && /^[[:space:]]*([^[:space:]]+[[:space:]]+){6}.*_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf/{print $6}' file

depending on whether you need partial or full field matching and whether or not your search string can ever contain regexp metachars.

Upvotes: 1

Related Questions