dhk
dhk

Reputation: 81

SED incorrectly replaces only the first instance of a pattern on a line

Hello: I have tab separated data of the form customer-item description-purchase price-category

e.g. a.out contains:

1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\tNULL\t3.0\tfruit
4\tCarrots\tNULL\tfruit
5\tNULL\tNULL\tfruit

I'm attempting to get rid of all the NULL fields. I can't rely on the simple replacement of the string "NULL" as it may be a substring; so I am attempting

sed -i 's:\tNULL\t:\t\t:g' a.out 

when I do this, I end up with

1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5.\t\tNULL\tfruit

what's wrong here is that #5 has only suffered a replacement of the first instance of the search string on each line.

If I run my sed command twice, I end up with the result I want:

1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5.\t\t\tfruit

where you can see that line 5 has both of the NULLs removed But I don't understand why I'm suffering this?

Upvotes: 2

Views: 174

Answers (4)

sjnarv
sjnarv

Reputation: 2374

From grep(1) on a recent Linux:

The Backslash Character and Special Expressions

The symbols \< and > respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word [...]

--

So, how about:

sed -i 's:\<NULL\>::g' a.out

Upvotes: 0

Michael Petch
Michael Petch

Reputation: 47573

Since tabs can't be inside strings in your case since that would imply a new field you might be able to do what you want simply by doing this;

sed ':start ; s/\tNULL\(\t\|$\)/\t\1/ ; t start' a.out

First the inner part s/\tNULL\(\t\|$\)/\t\1/ searches for tab NULL followed by a tab or end of line $ and replace with a tab followed by the character that did appear after NULL (this last part is done using \1). We'll call that expression

We now have:

sed ':start ; expression ; t start' a.out

This is effectively a loop (like goto). :start is a label. ; acts as a statement delimiter. I have described what expression does above. t start says that IF the expression did any substitution that a jump will be made to label start. The buffer will contain the substituted text. This loop occurs until no substitution can be done on the line and then processing continues.

Information on sed flow control and other useful tidbits can be found here

Upvotes: 3

John Kugelman
John Kugelman

Reputation: 361565

awk -F'\t' -v OFS='\t' '{
    for (i = 1; i <= NF; ++i) {
        if ($i == "NULL") {
            $i = "";
        }
    }
    print
}' test.txt

The straightforward solution is to use \t as a field separator and then loop over all of the fields looking for an exact match of "NULL". No substringing.

Here's the same thing as a one liner:

awk -F'\t' -v OFS='\t' '{for(i=1;i<=NF;++i) if($i=="NULL") $i=""} 1' test.txt

Upvotes: 3

anubhava
anubhava

Reputation: 784948

awk makes it simpler:

awk -F '\tNULL\\>' -v OFS='\t' '{$1=$1}1' file
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5\t\t\tfruit

Upvotes: 0

Related Questions