Reputation: 3085
I am trying to modify a tab delimited file by replacing the value of 2 columns separated by ":" where there is a null value in one column:
Start with this file:
id1 id2 pos1 pos2
749 2847443 21 13517135
862 2821796 21 13571669
997 21 13636494
1095 2821826 21 13661335
1131 21 13678797
I would like to modify column 2 to get this:
id1 id2 pos1 pos2
749 2847443 21 13517135
862 2821796 21 13571669
997 21:13636494 21 13636494
1095 2821826 21 13661335
1131 21:13678797 21 13678797
The problem is also that there is no blank space (" ") indicating a null value. For now I have not even been able to substitute the value of column 2 with column 3 when it is null...
I have tried using sed:
sed -e 's/\t\t/\t$3\t/g' input.txt > output
Which works but it just substitutes the text '$3' and I can't find how to substitute the values of $3:$4 instead.
I have also tried awk:
awk 'BEGIN {
IFS = OFS = "\t"
}
{
for (column = 2; column <= NF; ++column) {
if ($column == "") {
$column = $3
}
}
print
}
'
input.txt > output
But this does not work either (it does nothing actually neither using "" nor " "...)
Can you please help me? Thank you.
Upvotes: 0
Views: 2826
Reputation: 195209
checking null value is actually easy. but I didn't understand the requirement well. which column could be null? in your awk script, you have loop from $2->$NF, if there is null column, you didn't set to ":" separated values, but set with $3. what about $3 is null?
I assume that only $2 (column2) could be null, then the following awk line should do the job.
awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' file
test
<ff is your input file>
kent$ awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' ff
id1 id2 pos1 pos2
749 2847443 21 13517135
862 2821796 21 13571669
997 21:13636494 21 13636494
1095 2821826 21 13661335
1131 21:13678797 21 13678797
to see it clearly, we could pipe the output to column command:
kent$ awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' ff|column -t
id1 id2 pos1 pos2
749 2847443 21 13517135
862 2821796 21 13571669
997 21:13636494 21 13636494
1095 2821826 21 13661335
1131 21:13678797 21 13678797
hope it is helpful for you.
Upvotes: 3
Reputation: 17654
looking at the before/after text you posted
B: 997 21 13636494
A: 997 21:13636494 21 13636494
you want to replace the 2nd column if null
with the 3rd + '**:**' + 4th column
, right ?
so go for this:
sed 's/\(.*\)\t\t\(.*\)\t\(.*\)/\1\t\2:\3\t\2\t\3/g' testfile
that is match the line which has
something tabtab something tab something
and replace it with
1st-column tab 3rd-column:4th-column tab 3rd-column tab 4th-column
example:
$ cat testfile
749 2847443 21 13517135
862 2821796 21 13571669
997 21 13636494
1095 2821826 21 13661335
1131 21 13678797
$ sed 's/\(.*\)\t\t\(.*\)\t\(.*\)/\1\t\2:\3\t\2\t\3/g' testfile
749 2847443 21 13517135
862 2821796 21 13571669
997 21:13636494 21 13636494
1095 2821826 21 13661335
1131 21:13678797 21 13678797
Note: this looks for a missing 2nd column only, as what you talked about
PS: if you think that answers your question, don't forget to mark it as the correct answer
Upvotes: 2