Reputation: 41
There may be something on here like this but I haven't found it.
I have two conditions I'm trying to meet in order to print: one specific, one kind of open ended:
First,
$1 == "SCR"
Second, that the search string exists anywhere from $7
until the end of the line.
an example of the searchstring might be _RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf
So, in no particular syntax, I guess what I'm looking for is something like:
awk '$1 == "SCR" && _RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf exists anywhere in $7 to end of line {print $6}' inputfile.txt
Any help is welcomed. thanks.
Upvotes: 0
Views: 171
Reputation: 1871
index()
used for exact match of stringawk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
$1 == "SCR" {
f=$6
$1=$2=$3=$4=$5=$6=""
if (index($0, p))
print f
}
' inputfile.txt
As long as leading delimiters aren't part of the search string.... we should be good to go.
If we want this script to be formatted to fit on one single line:
awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '$1 == "SCR" { f=$6; $1=$2=$3=$4=$5=$6=""; if (index($0, p)) print f }' inputfile.txt
I decided to have a some fun with this little script today... Here is a side-effect filled function variant that minimizes the use of global variables. Suggest that the previous solution is much more straightforward (and the one we want), but the following code is both fun and educational:
awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
function x(v, f, i) {
f=$6
$1=$2=$3=$4=$5=$6=""
i = index($0, v)
$0=f
return i
}
$1 == "SCR" && x(p)
' inputfile.txt
How to figure the above out:
print f
on success, we get cute and just put f
back into the line buffer.print
??? Well, the final line of the script is where we put our magic... If an action is not specified for a given pattern, it is replaced with the default action.... The default action of a pattern success is to print
the line buffer. Success! :)&&
; so, there is no possibility of our x(p)
function firing off first and destroying $1
before evaluation of $1 == "SCR"
.Our favored language awk is not a threaded language... So with a little trickery.. we can repair this function and it's implementation to be both efficient and "pure"...
awk -v p='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' '
function x(v, b, i) {
b=$0
$1=$2=$3=$4=$5=$6=""
i = index($0, v)
$0=b
return i
}
$1 == "SCR" && x(p) {
print $6
}
' inputfile.txt
Why is the above important? Well, we want our functions to not mutate state.... we want our functions to return without side effects... that way we can continue our execution without worrying our line buffer -- or anything else -- has changed state. Ya.... we have cheated in this last example in leveraging/mutating global state, and then repairing it later... but by the end we have repaired it; so, "no harm no foul," eh? So... in this new implementation, we could in fact reverse x(p)
with $1 == "SCR"
and have no problems (except it would be slower)... more complex follow-on programs though would benefit from making our function "pure" as this method would make it easier to think through.
Upvotes: 1
Reputation: 67507
awk -v s='_RS1c80952b46129f' '$1=="SCR"{for(i=7;i<=NF;i++)
if($i==s) {print $6; next}}' file
Note that this is looking for an exact string match, not a regex.
Upvotes: 0
Reputation: 2801
gsub( ) returns # of instances it found (e.g. checking gsub(/[ ]+/, "&", $1) >= 6
. And seeing that number of spaces in between fields isn't overly a concern here, and alternate approach can be like
{mawk/mawk2/gawk} -v VAL1='_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf' 'BEGIN { FS = VAL1
} ( NF <= 1 ) || ! /^[ \t]*SCR[ \t]/ { next
} { FS = "[ \t]+";
$0 = $1; if (NF >= 6) { print $6 };
FS = VAL1; }'
Use the long string as the delimiter. In a default scenario, $1=="SCR"
is the same as potential leading blanks (spaces or tabs) , "SCR
", then blank again.
Pre-drop items that fail the up-front ones. The finally re-work the field seps to standard one and check whether you have at least 6 of those items.
3 more alternative approaches would be
(1) in final code chunk do array split instead of re-working the seps, or
(2) don't change FS, but after $0 = $1
, create a repetitive regex that's 5x of this :
([^ ]+[ \t]+)
( gawk offers {5}
, mawk not really so ) : sub( )
1 time off the left. if you can still find a blank afterwards, then take everything prior to that. that's your $6
.
(3) similar to alternative (2), but use a for-loop( )
for 5 times to sub them away off the left side.
Upvotes: 0
Reputation: 203532
This might be what you're looking for:
awk '($1 == "SCR") && /^[[:space:]]*([^[:space:]]+[[:space:]]+){6}.*_RS1c80952b46129fc3bbf7b6d35e52f581d5d96edf/{print $6}' file
depending on whether you need partial or full field matching and whether or not your search string can ever contain regexp metachars.
Upvotes: 1