sunny
sunny

Reputation: 3891

How does this sed command parse numbers with commas?

I'm having difficulty understanding a number-parsing sed command I saw in this article:

sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt

I'm a sed newbie, so this is what I've been able to figure out:

Here's what I am hoping folks can explain

Upvotes: 0

Views: 326

Answers (2)

choroba
choroba

Reputation: 242383

The matching is greedy, i.e. it matches the leftmost three digits NOT preceded by a word boundary and followed by the word boundary, i.e. the rightmost three digits. After inserting the comma, the "goto" makes it match again, but the comma introduced a new word boundary, so the match happens earlier.

Upvotes: 2

anubhava
anubhava

Reputation: 786289

Bisecting this command:

sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt

-i     # inline editing to save changes in input file
\B     # opposite of \b (word boundary) - to match between words
[0-9]  # match any digit
\{3,\} # match exact 3 digits
\>     # word boundary
&      # use matched pattern in replacement
:a     # start label a
ta     # go back to label a until \B[0-9]\{3\}\> is matches

Yes indeed this sed command starts match/replacement from right most 3 digits and keeps going left till it finds 3 digits.


Update: However looking at this inefficient sed command in a loop I recommend this much simpler and faster awk instead:

awk '/^[0-9]+$/{printf "%\047.f\n", $1}' file
20,130,607,215,015
607,220,701
992,171

Where input file is:

cat file
20130607215015
607220701
992171

Upvotes: 3

Related Questions