Reputation: 81
Hello: I have tab separated data of the form customer-item description-purchase price-category
e.g. a.out contains:
1\t400 Bananas\t3.00\tfruit 2\t60 Oranges\t0.00\tfruit 3\tNULL\t3.0\tfruit 4\tCarrots\tNULL\tfruit 5\tNULL\tNULL\tfruit
I'm attempting to get rid of all the NULL fields. I can't rely on the simple replacement of the string "NULL" as it may be a substring; so I am attempting
sed -i 's:\tNULL\t:\t\t:g' a.out
when I do this, I end up with
1\t400 Bananas\t3.00\tfruit 2\t60 Oranges\t0.00\tfruit 3\t\t3.0\tfruit 4\tCarrots\t\tfruit 5.\t\tNULL\tfruit
what's wrong here is that #5 has only suffered a replacement of the first instance of the search string on each line.
If I run my sed command twice, I end up with the result I want:
1\t400 Bananas\t3.00\tfruit 2\t60 Oranges\t0.00\tfruit 3\t\t3.0\tfruit 4\tCarrots\t\tfruit 5.\t\t\tfruit
where you can see that line 5 has both of the NULLs removed But I don't understand why I'm suffering this?
Upvotes: 2
Views: 174
Reputation: 2374
From grep(1)
on a recent Linux:
The Backslash Character and Special Expressions
The symbols \< and > respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word [...]
--
So, how about:
sed -i 's:\<NULL\>::g' a.out
Upvotes: 0
Reputation: 47573
Since tabs can't be inside strings in your case since that would imply a new field you might be able to do what you want simply by doing this;
sed ':start ; s/\tNULL\(\t\|$\)/\t\1/ ; t start' a.out
First the inner part s/\tNULL\(\t\|$\)/\t\1/
searches for tab
NULL
followed by a tab
or end of line $
and replace with a tab
followed by the character that did appear after NULL
(this last part is done using \1
). We'll call that expression
We now have:
sed ':start ; expression ; t start' a.out
This is effectively a loop (like goto). :start is a label. ;
acts as a statement delimiter. I have described what expression does above. t start
says that IF the expression did any substitution that a jump will be made to label start
. The buffer will contain the substituted text. This loop occurs until no substitution can be done on the line and then processing continues.
Information on sed flow control and other useful tidbits can be found here
Upvotes: 3
Reputation: 361565
awk -F'\t' -v OFS='\t' '{
for (i = 1; i <= NF; ++i) {
if ($i == "NULL") {
$i = "";
}
}
print
}' test.txt
The straightforward solution is to use \t
as a field separator and then loop over all of the fields looking for an exact match of "NULL"
. No substringing.
Here's the same thing as a one liner:
awk -F'\t' -v OFS='\t' '{for(i=1;i<=NF;++i) if($i=="NULL") $i=""} 1' test.txt
Upvotes: 3
Reputation: 784948
awk
makes it simpler:
awk -F '\tNULL\\>' -v OFS='\t' '{$1=$1}1' file
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5\t\t\tfruit
Upvotes: 0