Reputation: 441
How one can exploit sed to insert spaces between every three digits but only if a number is longer than 10 digits, ie:
blahaaaaaa goog sdd 234 3242423
ala el 213123123123
1231231313123 i 14124124141411
should turn into:
blahaaaaaa goog sdd 234 3242423
ala el 213 123 123 123
123 123 131 312 3 i 141 241 241 414 11
I can easily separate 3-digits numbers using sed 's/[0-9]\{3\}/& /g'
but cannot combine that with a number length.
Upvotes: 3
Views: 369
Reputation: 10123
A single (GNU) sed
command could be enough:
sed -E 's/([0-9]{10,})/\n&\n/g; :a; s/([ \n])([0-9]{3})([0-9]+\n)/\1\2 \3/; ta; s/\n//g' file
Update:
Walter A suggested a bit more concise sed
expression which works fine if I haven't overlooked something:
sed -E 's/([0-9]{10,})/&\n/g; :a; s/([0-9]{3})([0-9]+\n)/\1 \2/; ta; s/\n//g' file
Explanation:
-E
flag instructs the sed
to use the extended regular expression syntax (to get rid of escape slashes before (){}+
characters).s/([0-9]{10,})/&\n/g
appends a new-line (\n
) character to all digit sequences with 10 or more digits. This is in order to differentiate the digit sequences we are dealing with. The \n
is a safe choice here because it cannot occur in the pattern space as read from the input line since it is the delimiter terminating the line. Notice that we are processing a single line per cycle (ie, since no multiline techniques are used, \n
can be used as an anchor without interfering with other characters in the line).:a; s/([0-9]{3})([0-9]+\n)/\1 \2/; ta
This is a loop. :a
is a label and could be any word (the :
indicates the label). ta
means jump to the label a
if the last substitution (s
command) is successful. The s
command here repeatedly (because it is the body of the loop) replaces, from left to right, a 3-digit sequence with the same 3 digits concatenated by a space character, only if this 3-digit sequence is immediately followed by one or more digits delimited by a \n
character, until no substitution is possible.s/\n//g
removes all \n
instances from the resultant pattern space. They have been used as an anchor, or marker, to delimit the end of the digit sequences with more than or equal to 10 characters. Their mission has been completed now.Upvotes: 5
Reputation: 58420
This might work for you (GNU sed):
cat <<\!|sed -Ef - file
/[[:digit:]]{10,}/{
s//\n&\n/
h
s/.*\n(.*)\n.*/\1/
s/.{3}\B/& /g
G
s/(.*)(\n.*)\n.*\n/\2\1/
D
}
!
Determine if the current line has any 10 or more digit numbers and if so process them.
Surround the first such number by newlines.
Copy the whole line to the hold space (HS).
Remove everything except the number from the current line.
Space the number every 3 digits (only do so if there is a following digit).
Append the original line from HS to the current line.
Replace the original number by the spaced number and remove all introduced newlines except the first.
Delete the introduced newline and thus repeat the process.
N.B. The D
command removes upto and including the first newline in the current line i.e. the pattern space. If there is no newline, it acts the same as the d
command. However if there is a newline, once it has removed the text before and including the newline, if there is further text it begins a new cycle but does not read in another line from the input. Thus it treats whatever remains in the pattern space as if it has read in a another line of input and starts the sed cycle again. By inserting a newline and then using D
command it is identical to :a;...;ba
.
Or if you prefer:
sed -E '/[[:digit:]]{10,}/{s//\n&\n/;h;s/.*\n(.*)\n.*/\1/;s/.{3}\B/& /g;G;s/(.*)(\n.*)\n.*\n/\2\1/;D}' file
An alternative that just uses the pattern space:
sed -E '/[[:digit:]]{10,}/{s//\n&\n/;s/(.*)(\n.*)(\n.*)/\1\3\2/;:a;s/^(.*\n.*\n([[:digit:]]{3} )*[[:digit:]]{3}\B)/\1 /;ta;s/(.*)\n(.*)\n(.*)/\n\1\3\2/;D}' file
Upvotes: 1
Reputation: 20002
preprocess and postprocess the file:
tr "\n " "\r\n" < "${file}" | sed -r '/[0-9]{10}/ s/[0-9]{3}/& /g' | tr '\r\n' '\n '
Upvotes: 1
Reputation: 626845
When you need to meet a complex set of requirements like this, it is more convenient to use perl
:
perl -i -pe 's/\d{10,}/$&=~s|\d{3}|$& |gr/ge' file
Here,
\d{10,}
- matches 10 or more consecutive digits$&=~s|\d{3}|$& |gr
- takes the whole match (the 10+ digit substring) and replaces every 3-digit chunk (matched with \d{3}
) with this match (since $&
is the placeholder for the whole match) and a space. g
is used to perform as many replacements as there are matches in the input, and r
is used to return substitution and leave the original string untouched.ge
- this flag combination means the all matches will be replaced (g
), and e
is necessary since the replacement string here is a regular expression to be evaluated.Upvotes: 2