Pawel
Pawel

Reputation: 441

sed insert spaces between digits for long numbers

How one can exploit sed to insert spaces between every three digits but only if a number is longer than 10 digits, ie:

blahaaaaaa goog sdd 234 3242423
ala el 213123123123 
1231231313123 i 14124124141411

should turn into:

blahaaaaaa goog sdd 234 3242423
ala el 213 123 123 123
123 123 131 312 3 i 141 241 241 414 11

I can easily separate 3-digits numbers using sed 's/[0-9]\{3\}/& /g' but cannot combine that with a number length.

Upvotes: 3

Views: 369

Answers (4)

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10123

A single (GNU) sed command could be enough:

sed -E 's/([0-9]{10,})/\n&\n/g; :a; s/([ \n])([0-9]{3})([0-9]+\n)/\1\2 \3/; ta; s/\n//g' file

Update:

Walter A suggested a bit more concise sed expression which works fine if I haven't overlooked something:

sed -E 's/([0-9]{10,})/&\n/g; :a; s/([0-9]{3})([0-9]+\n)/\1 \2/; ta; s/\n//g' file

Explanation:

  • -E flag instructs the sed to use the extended regular expression syntax (to get rid of escape slashes before (){}+ characters).
  • s/([0-9]{10,})/&\n/g appends a new-line (\n) character to all digit sequences with 10 or more digits. This is in order to differentiate the digit sequences we are dealing with. The \n is a safe choice here because it cannot occur in the pattern space as read from the input line since it is the delimiter terminating the line. Notice that we are processing a single line per cycle (ie, since no multiline techniques are used, \n can be used as an anchor without interfering with other characters in the line).
  • :a; s/([0-9]{3})([0-9]+\n)/\1 \2/; ta This is a loop. :a is a label and could be any word (the : indicates the label). ta means jump to the label a if the last substitution (s command) is successful. The s command here repeatedly (because it is the body of the loop) replaces, from left to right, a 3-digit sequence with the same 3 digits concatenated by a space character, only if this 3-digit sequence is immediately followed by one or more digits delimited by a \n character, until no substitution is possible.
  • s/\n//g removes all \n instances from the resultant pattern space. They have been used as an anchor, or marker, to delimit the end of the digit sequences with more than or equal to 10 characters. Their mission has been completed now.

Upvotes: 5

potong
potong

Reputation: 58420

This might work for you (GNU sed):

cat <<\!|sed -Ef - file
/[[:digit:]]{10,}/{
  s//\n&\n/
  h
  s/.*\n(.*)\n.*/\1/
  s/.{3}\B/& /g
  G
  s/(.*)(\n.*)\n.*\n/\2\1/
  D
}
!

Determine if the current line has any 10 or more digit numbers and if so process them.

Surround the first such number by newlines.

Copy the whole line to the hold space (HS).

Remove everything except the number from the current line.

Space the number every 3 digits (only do so if there is a following digit).

Append the original line from HS to the current line.

Replace the original number by the spaced number and remove all introduced newlines except the first.

Delete the introduced newline and thus repeat the process.

N.B. The D command removes upto and including the first newline in the current line i.e. the pattern space. If there is no newline, it acts the same as the d command. However if there is a newline, once it has removed the text before and including the newline, if there is further text it begins a new cycle but does not read in another line from the input. Thus it treats whatever remains in the pattern space as if it has read in a another line of input and starts the sed cycle again. By inserting a newline and then using D command it is identical to :a;...;ba.

Or if you prefer:

sed -E '/[[:digit:]]{10,}/{s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/.{3}\B/& /g;G;s/(.*)(\n.*)\n.*\n/\2\1/;D}' file

An alternative that just uses the pattern space:

sed -E '/[[:digit:]]{10,}/{s//\n&\n/;s/(.*)(\n.*)(\n.*)/\1\3\2/;:a;s/^(.*\n.*\n([[:digit:]]{3} )*[[:digit:]]{3}\B)/\1 /;ta;s/(.*)\n(.*)\n(.*)/\n\1\3\2/;D}' file

Upvotes: 1

Walter A
Walter A

Reputation: 20002

preprocess and postprocess the file:

tr "\n " "\r\n" < "${file}" | sed -r '/[0-9]{10}/ s/[0-9]{3}/& /g'  | tr '\r\n' '\n '

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

When you need to meet a complex set of requirements like this, it is more convenient to use perl:

perl -i -pe 's/\d{10,}/$&=~s|\d{3}|$& |gr/ge' file

Here,

  • \d{10,} - matches 10 or more consecutive digits
  • $&=~s|\d{3}|$& |gr - takes the whole match (the 10+ digit substring) and replaces every 3-digit chunk (matched with \d{3}) with this match (since $& is the placeholder for the whole match) and a space. g is used to perform as many replacements as there are matches in the input, and r is used to return substitution and leave the original string untouched.
  • ge - this flag combination means the all matches will be replaced (g), and e is necessary since the replacement string here is a regular expression to be evaluated.

Upvotes: 2

Related Questions