SonicProtein
SonicProtein

Reputation: 850

Output the line number when there is a matching value, for each column

Say I've got a file.txt

Position name1 name2 name3
       2     A     G     F
       4     G     S     D
       5     L     K     P
       7     G     A     A
       8     O     L     K
       9     E     A     G

and I need to get the output:

name1 name2 name3
    2     2     7
    4     7     9
    7     9

It outputs each name, and the position numbers where there is an A or G

In file.txt, the name1 column has an A in position 2, G's in positions 4 and 7... therefore in the output file: 2,4,7 is listed under name1 ...and so on

Strategy I've devised so far (not very efficient): reading each column one at a time, and outputting the position number when a match occurs. Then I'd get the result for each column and cbind them together using r.

I'm fairly certain there's a better way using awk or bash... ideas appreciated.

Upvotes: 0

Views: 73

Answers (2)

sjsam
sjsam

Reputation: 21965

Save the below script :

#!/bin/bash
 gawk '{if( NR ==  1 ) {print $2 >>"name1"; print $3 >>"name2"; print $4>>"name3";}}
       {if($2=="A" || $2=="G"){print $1 >> "name1"}}
       {if($3=="A" || $3=="G"){print $1 >> "name2"}}
       {if($4=="A" || $4=="G"){print $1 >> "name3"}}
       END{system("paste name*;rm name*")}' $1

as finder. Make finder an executable(using chmod) and then do :

./finder file.txt

Note : I have used three temporary files name1, name2 and name3. You could change the file names at your convenience. Also, these files will be deleted at the end.

Edit : Removed the BEGIN part of the gawk.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204558

$ cat tst.awk
NR==1 {
    for (nameNr=2;nameNr<=NF;nameNr++) {
        printf "%5s%s", $nameNr, (nameNr<NF?OFS:ORS)
    }
    next
}
{
    for (nameNr=2;nameNr<=NF;nameNr++) {
        if ($nameNr ~ /^[AG]$/) {
            hits[nameNr,++numHits[nameNr]] = $1
            maxHits = (numHits[nameNr] > maxHits ? numHits[nameNr] : maxHits)
        }
    }
}
END {
    for (hitNr=1; hitNr<=maxHits; hitNr++) {
        for (nameNr=2;nameNr<=NF;nameNr++) {
            printf "%5s%s", hits[nameNr,hitNr], (nameNr<NF?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file
name1 name2 name3
    2     2     7
    4     7     9
    7     9

Upvotes: 3

Related Questions