Reputation: 159
I have a tab delimited file which looks like this:
CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C
2L <TAB> 440 <TAB>0/1:63:60,0,249 <TAB>0/1:89:86,0,166 <TAB>1/1:96:107,24,0<TAB>1/1:49:42,6,0
2L <TAB> 260<TAB>0/1:66:63,0,207<TAB> 1/1:99:227,111,0<TAB>1/1:99:255,144,0<TAB> 1/1:49:42,6,0
2L <TAB> 595 <TAB> 0/1:11:85,0,8 <TAB>0/1:13:132,0,10 <TAB>0/1:73:70,0,131<TAB> 0/1:59:72,0,56
I want to select only the first 3 characters starting from column 3 so that I can get an output that looks like this:
CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C
2L <TAB> 440 <TAB> 0/1 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1
2L <TAB> 260 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1 <TAB> 1/1
2L <TAB> 595 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1
Thanks
Upvotes: 2
Views: 1724
Reputation: 58430
This might work for you (GNU sed):
sed '1b;s/\(\S\{3\}\)\S*/\1/2g' file
Upvotes: 0
Reputation: 36262
One way using GNU sed
. From second line until last one, substitute all characters between tabs with the first three at the beginning, and do it many times in each line but only from second match (avoiding first two fields):
sed '2,$ s/\([\t]...\)[^\t]*/\1/2g' infile
Output:
CHROM POS AD0062-C AD0063-C AD0065-C AD0074-C
2L 440 0/1 0/1 1/1 1/1
2L 260 0/1 1/1 1/1 1/1
2L 595 0/1 0/1 0/1 0/1
Upvotes: 0
Reputation: 36262
Using awk
. For every line but first one and if it has more that two fields, get substring of them. The print
command it is for every line because it has no condition.
awk '
BEGIN { OFS = "\t" }
NF > 2 && FNR > 1 {
for ( i=3; i<=NF; i++ ) {
$i = substr( $i, 1, 3 )
}
}
{ print }
' infile
Output:
CHROM POS AD0062-C AD0063-C AD0065-C AD0074-C
2L 440 0/1 0/1 1/1 1/1
2L 260 0/1 1/1 1/1 1/1
2L 595 0/1 0/1 0/1 0/1
Upvotes: 2