Reputation: 3891
I'm having difficulty understanding a number-parsing sed command I saw in this article:
sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt
I'm a sed
newbie, so this is what I've been able to figure out:
&
adds to what's already there rather than substitutes:a; ... ;ta
calls the substitution recursively on the line until the search finds no more returnsHere's what I am hoping folks can explain
-i
do? I can't seem to find it on the man pages though I'm sure it's there.\B
is accomplishing here? Perhaps it helps with the left-right parsing priority, but I don't see how. So lastly...1234566778,9 ---> 1234,566,778,9
Upvotes: 0
Views: 326
Reputation: 242383
The matching is greedy, i.e. it matches the leftmost three digits NOT preceded by a word boundary and followed by the word boundary, i.e. the rightmost three digits. After inserting the comma, the "goto" makes it match again, but the comma introduced a new word boundary, so the match happens earlier.
Upvotes: 2
Reputation: 786289
Bisecting this command:
sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt
-i # inline editing to save changes in input file
\B # opposite of \b (word boundary) - to match between words
[0-9] # match any digit
\{3,\} # match exact 3 digits
\> # word boundary
& # use matched pattern in replacement
:a # start label a
ta # go back to label a until \B[0-9]\{3\}\> is matches
Yes indeed this sed command starts match/replacement from right most 3 digits and keeps going left till it finds 3 digits.
Update: However looking at this inefficient sed command in a loop I recommend this much simpler and faster awk instead:
awk '/^[0-9]+$/{printf "%\047.f\n", $1}' file
20,130,607,215,015
607,220,701
992,171
Where input file is:
cat file
20130607215015
607220701
992171
Upvotes: 3