ThePresident
ThePresident

Reputation: 311

Filtering with awk based on two criteria

I had a similar question earlier but this time I need something a bit more sophisticated:

In a txt file that looks like this:

147 186741 2S74M -162
83 647172 1S75M -221
163 584665 74M2S 271
99 658416 5S65M6S -272
163 718735 60M16S 243

I want the awk to look at the 3rd column, when it encounters character "S" in either 2nd or 3rd position, it then looks at the first column, when it encounters either "147" or "83", it discards that line. The rest of the results are passed to the second awk where it looks at the 3rd line again, when it encounters character "S" at the end, it then looks at the 1st column and if it finds either "99" or "163" it discards those lines. Then it prints the rest of the lines that didn't meet these filters.

I tried something along these lines, but got blank file:

awk -Ft '{if ($3 ~ /S$/  && $1 ~ /99|163/)} {next}' | awk -Ft '{if ($3 ~ /^..?S/ && $1 ~ /147|83/)} {next} $6 ~ /S/ {print}' input.txt > output.txt 

Upvotes: 3

Views: 7342

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133518

Since you haven't shown Input_file which you are using so I have taken my example based on your shown Input_file, let's say following is your Input_file.

cat Input_file
147 186741 2S74M -162
83 647172 1S75M -221
163 584665 74M2S 271
99 658416 5S65M6S -272
163 718735 60M16S 243
147 186741 2K74M -162
83 647172 1K75M -221
163 584665 74M2K 271
99 658416 5S65M6K -272
163 718735 60M16S 243

Now following is my code:

awk '(($1==147 || $1==83) && (substr($3,2,1)=="S" || substr($3,3,1)=="S")) || (substr($3,length($3))=="S" && ($1==99 || $1==163)){next} 1'  Input_file

Now when I run above awk I am getting those values(which I have added simply to check if my code is working or not) as follows.

awk '(($1==147 || $1==83) && (substr($3,2,1)=="S" || substr($3,3,1)=="S")) || (substr($3,length($3))=="S" && ($1==99 || $1==163)){next} 1' Input_file
147 186741 2K74M -162
83 647172 1K75M -221
163 584665 74M2K 271
99 658416 5S65M6K -272

So you could see all those lines which are NOT coming under conditions provided by you are getting printed, give me sometime will add explanation too here.

EDIT: Adding explanation of above code too here, kindly DO NOT run this as I have divided it to different sections for OP's understanding only.

awk '(($1==147 || $1==83)\  ##First condition which re-presents your first awk starts here. checking conditions where $1 value is either 147 OR $1 value is 83
&& \                        ##  AND
(substr($3,2,1)=="S" \      ##substring of 3rd column is EQUAL to letter S
|| \                        ##  OR
substr($3,3,1)=="S"))\      ##substring of 3rd column is EQUAL to letter S
|| \                        ##OR(means either that first aw condition should be TRUE or this following one), the second major condition for which you used second awk I clubbed both the awks into 2 major conditions here.
(substr($3,length($3))=="S"\##checking if substring of column 3s last letter is EQUAL to S here.
&& \                        ##  AND
($1==99 || $1==163)){       ##$1 value is either 99 or 163. So if either of above 2 major conditions are TRUE then perform following statements.
next                        ##next, it is awk keyword which will skip all further statements of line now, without doing any action.
}
1                           ##awk works on method of condition and then action, so here I am making condition as TRUE by mentioning as 1 and NO action is mentioned so be default print action will happen which will print current line.
' Input_file                  ##mentioning Input_file here.

Upvotes: 3

Marc Lambrichs
Marc Lambrichs

Reputation: 2892

For starters, the $6 is possibly a typo.

Now let's try to do this in steps. Step 1:

awk '$1 ~ /147|83/ && $3 ~ /^..?S/ {next;} {print;}' test.txt

leaves us with:

163 584665 74M2S 271
99 658416 5S65M6S -272
163 718735 60M16S 243

If you put these lines in a file test2.txt, then applying:

awk '($1 ~ /99|163/ && $3 ~ /S$/) {next;} {print;}' test2.txt

leaves us with no valid lines, because all 3rd columns have an 'S' at the end and start with either 99 or 163.

Upvotes: 3

Related Questions