Reputation: 850
Say I've got a file.txt
Position name1 name2 name3
2 A G F
4 G S D
5 L K P
7 G A A
8 O L K
9 E A G
and I need to get the output:
name1 name2 name3
2 2 7
4 7 9
7 9
It outputs each name, and the position numbers where there is an A or G
In file.txt, the name1 column has an A in position 2, G's in positions 4 and 7... therefore in the output file: 2,4,7 is listed under name1 ...and so on
Strategy I've devised so far (not very efficient): reading each column one at a time, and outputting the position number when a match occurs. Then I'd get the result for each column and cbind them together using r.
I'm fairly certain there's a better way using awk or bash... ideas appreciated.
Upvotes: 0
Views: 73
Reputation: 21965
Save the below script :
#!/bin/bash
gawk '{if( NR == 1 ) {print $2 >>"name1"; print $3 >>"name2"; print $4>>"name3";}}
{if($2=="A" || $2=="G"){print $1 >> "name1"}}
{if($3=="A" || $3=="G"){print $1 >> "name2"}}
{if($4=="A" || $4=="G"){print $1 >> "name3"}}
END{system("paste name*;rm name*")}' $1
as finder
. Make finder an executable(using chmod) and then do :
./finder file.txt
Note : I have used three temporary files name1, name2 and name3. You could change the file names at your convenience. Also, these files will be deleted at the end.
Edit : Removed the BEGIN
part of the gawk.
Upvotes: 1
Reputation: 204558
$ cat tst.awk
NR==1 {
for (nameNr=2;nameNr<=NF;nameNr++) {
printf "%5s%s", $nameNr, (nameNr<NF?OFS:ORS)
}
next
}
{
for (nameNr=2;nameNr<=NF;nameNr++) {
if ($nameNr ~ /^[AG]$/) {
hits[nameNr,++numHits[nameNr]] = $1
maxHits = (numHits[nameNr] > maxHits ? numHits[nameNr] : maxHits)
}
}
}
END {
for (hitNr=1; hitNr<=maxHits; hitNr++) {
for (nameNr=2;nameNr<=NF;nameNr++) {
printf "%5s%s", hits[nameNr,hitNr], (nameNr<NF?OFS:ORS)
}
}
}
$ awk -f tst.awk file
name1 name2 name3
2 2 7
4 7 9
7 9
Upvotes: 3