HKJ3
HKJ3

Reputation: 477

Awk to replace a value in the header with the value next to it?

I have a compressed text file (chrall.txt.gz) that looks like this. It has a header line with pairs of IDs for each individual. E.g. 1032 AND 468768 are IDs for one individual. There are 1931 individuals in the file, therefore 3862 IDs in total. Each pair corresponds to one individual. E.g. the next individual would be 1405 468769 etc....

After the header is 21465139 lines. I am not interested in the lines/body of the file. Just the header

misc SNP pos A2 A1 1032 468768 1405 468769 1564 468770 1610 468771 998 468774 975 468775 1066 468776 1038 468778 1275 468781 999 468782 976 468783 1145 468784 1141 468786 1280 468789 910 468790 978 468791 1307 468792 ...

--- rs1038757:1072:T:TA 1072 TA T 1.113 0.555 1.612 0.519 0.448 0.653 1.059 0.838 1.031 0.518 1.046 0.751 1.216 1.417 1.008 0.917 0.64 1.04 1.113 1.398 1.173 0.956 …

I want to replace every first ID of one pair e.g. 1032, 1405, 1564, 1610, 998, 975 with the ID next to it. So every 1, 3, 5, 7, 9 ID etc... is replaced to the ID next to it. So it looks like this:

misc SNP pos A2 A1 468768 468768 468769 468769 468770 468770 468771 468771 468774 468774 468775 468775 468776 468776 468778 468778 468781 468781 468782 468782 468783 468783 468784 468784 468786 468786 468789 468789 468790 468790 468791 468791 468792 468792 

etc..

I am completely stumped on how to do this. My guess is use awk/gsub and replace every nth occurrence 1, 3, 5, 7, 9 to the value next to it...Also need to ignore this bit misc SNP pos A2 A1

My working out:

Read first line and ignore first 5 fields:

awk FNR==1'{ $1=""; $2=""; $3=""; $4=""; $5="";}'

Someone used this code to replace the 3rd occurence to A. I am assuming I replace the 3 to 2 as I want to replace every 2 occurrence but the only problem is I also want to replace the first ID as well...

awk '{ c=0; for (i = 0; ++i <= NF;){ if( $i == v){c++;if(c%3==0){ $i = l }} } }1' OFS= FS= n=3 v=a l=c

replace nth occurrence of character in a file using awk regardless of the line

I am not sure how to adapt it to mine...

Upvotes: 0

Views: 53

Answers (2)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2855

{m,g}awk -F'^.+[A-Za-z][0-9]+ +[0-9]+ +' '!_<NR ||

$!NF = sprintf("%.*s%s%.0s",(___ = substr($_,++_,-_+index($!_, $++_)))* \
        sub("[ ]*[^ ]+ *$",_="",___) * sub("^"(__="[0-9]+"),"_",$!(NF = NF)),
       gsub(" "__" "," ")*gsub("_",_)*gsub(" "__,"&&"), ___$_,FS="^$")' OFS=' _'
misc SNP pos A2 A1 468768 468768 468769 468769 468770 468770 468771 468771 468774 468774 468775 468775 468776 468776 468778 468778 468781 468781 468782 468782 468783 468783 468784 468784 468786 468786 468789 468789 468790 468790 468791 468791 468792 468792
    
--- rs1038757:1072:T:TA 1072 TA T 1.113 0.555 1.612 0.519 0.448 0.653 1.059 0.838 1.031 0.518 1.046 0.751 1.216 1.417 1.008 0.917 0.64 1.04 1.113 1.398 1.173 0.956 …

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203899

If you don't want to replace the first 5 fields then just don't include them in the loop by starting it at 6, and if you want to replace every 2nd field then just increment the loop variable by 2 on each iteration:

$ awk 'NR==1{for (i=6;i<NF;i+=2) $i=$(i+1)} 1' file
misc SNP pos A2 A1 468768 468768 468769 468769 468770 468770 468771 468771 468774 468774 468775 468775 468776 468776 468778 468778 468781 468781 468782 468782 468783 468783 468784 468784 468786 468786 468789 468789 468790 468790 468791 468791 468792 468792 ...

--- rs1038757:1072:T:TA 1072 TA T 1.113 0.555 1.612 0.519 0.448 0.653 1.059 0.838 1.031 0.518 1.046 0.751 1.216 1.417 1.008 0.917 0.64 1.04 1.113 1.398 1.173 0.956 …

Upvotes: 2

Related Questions