Reputation: 121
i have trouble doing some stuff with awk. I want to split a file into 2 files, it's working mostly but i have one last issue:
this is one of my input file :
samplexxx EH Tred GangSTR
dijen006 nofile nofile nofile
dijen006_100 22,30 22,27 19,25
dijen006_75 25,27 29 NA
dijen017 nofile nofile nofile
dijen017_100 75,121 54 24,24
dijen017_75 74,131 72 19,19
dijen081 63,84 32 40,40
dijen081_100 70,115 78 25,41
dijen081_75 79,143 95 24,104
dijen082 47,51 38 15,34
dijen082_100 46,61 52 6,32
dijen082_75 NA 55 17,17
dijen083 30,53 30,40 38,38
dijen083_100 43,53 30,59 23,32
dijen083_75 43,60 18,74 23,71
dijen1013 30 30 20,30
dijen1013_100 30 30 9,19
dijen1013_75 21 33 20,20
dijen1014 9,30 9,30 9,30
dijen1014_100 9,28 9,43 9,11
dijen1014_75 9,28 9,36 9,29
dijen1015 23,30 23,30 23,29
dijen1015_100 23,30 NA 13,22
dijen1015_75 25,27 21,42 22,39
dijen402 25,31 25,31 25,31
dijen402_100 30 29,36 14,30
dijen402_75 25,26 22,39 22,39
i am using this code :
#!/bin/awk -f
#USAGE = awk -v my_var=$ibasename $i .tsv) split_file_allelle.awk $i
BEGIN { FS=OFS="\t" }
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
split($i,a,/,/)
str1 = str1 OFS a[1]
str2 = str2 OFS a[2]
}
}
{
print str1 > my_var"_all1.tsv"
print str2 > my_var"_all2.tsv"
}
and i have two file, one like that, splited on the ",". Do you think it would be a way to get, on the second file where there is no number, something like 'NA' instead of blank?
samplexxx EH Tred GangSTR
dijen006
dijen006_100 30 27 25
dijen006_75 27
dijen017
dijen017_100 121 24
dijen017_75 131 19
dijen081 84 40
dijen081_100 115 41
dijen081_75 143 104
dijen082 51 34
dijen082_100 61 32
dijen082_75 17
dijen083 53 40 38
dijen083_100 53 59 32
dijen083_75 60 74 71
dijen1013 30
dijen1013_100 19
dijen1013_75 20
dijen1014 30 30 30
dijen1014_100 28 43 11
dijen1014_75 28 36 29
dijen1015 30 30 29
dijen1015_100 30 22
dijen1015_75 27 42 39
dijen402 31 31 31
dijen402_100 36 30
dijen402_75 26 39 39
this is what i have, but i would like to have something like that :
samplexxx EH Tred GangSTR
dijen006 NA NA NA
dijen006_100 30 27 25
dijen006_75 27 NA NA
dijen017 NA NA NA
dijen017_100 121 NA 24
....
thanks for your help!
Upvotes: 1
Views: 49
Reputation: 204280
BEGIN {
FS = OFS = "\t"
all1 = my_var "_all1.tsv"
all2 = my_var "_all2.tsv"
}
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
n = split($i,a,",")
str1 = str1 OFS a[1]
str2 = str2 OFS (n == 1 ? "NA" : a[2])
}
}
{
print str1 > all1
print str2 > all2
}
It wasn't necessary to change print str1 > my_var"_all1.tsv"
to print str1 > all1
to solve the specific problem you asked about, the ternary using the test of split()
s return does that, BUT print str1 > my_var"_all1.tsv"
is undefined behavior per POSIX so it'd fail in some awks and instead needs to be written using a variable as I have or with parens around the expression that generates the file name, print str1 > (my_var"_all1.tsv")
. Using a variable and doing the concatenation once total instead of once per line is more efficient.
Upvotes: 2