Reputation: 1481
I've been able to do this in pieces via awk print $2, sed [a-z] etc, but how would I do this to one stream via sed all at once?
host_192.168.0.100 host_192.168.0.100
Turns into
host_192.168.0.100 192.168.0.100
Also, 'host' is just a placeholder, I really do need 'all' letters removed leaving numbers/punctuation.
Edit:: Grabbing the underscore would be nice as well, however I'm sure I can figure that out Some other common examples would be:
ab-ab-abababab-ABABABAB-000.000.000.0 ab-ab-abababab-ABABABAB-000.000.000.0
01-admin-10.10.10.10 01-admin-10.10.10.10
10.10.10.10-NAT 10.10.10.10-NAT
1test-10.10.10.10 1test-10.10.10.10
Thanks!
Upvotes: 0
Views: 297
Reputation: 753675
Given the second example, it appears that you want to remove all the non-digits appearing after the first white space, logically before the first digit. You need it to remove dashes, underscores, even dots, as well as letters; anything that isn't a digit. That suggests:
sed -e 's/ [^0-9]*/ /'
This is fairly minimalistic, but meets your criteria:
$ cat data
host_192.168.0.100 host_192.168.0.100
ab-ab-abababab-ABABABAB-000.000.000.0 ab-ab-abababab-ABABABAB-000.000.000.0
$ sed -e 's/ [^0-9]*/ /' data
host_192.168.0.100 192.168.0.100
ab-ab-abababab-ABABABAB-000.000.000.0 000.000.000.0
$
A large part of the skill in writing good regular expressions is writing a good description of what you want the regular expression to actually do (in terms that make sense to regular expressions).
The three new items with leading digits and letters, and with trailing material, complicate life considerably:
$ cat data
host_192.168.0.100 host_192.168.0.100
ab-ab-abababab-ABABABAB-000.000.000.0 ab-ab-abababab-ABABABAB-000.000.000.0
01-admin-10.10.10.10 01-admin-10.10.10.10
10.10.10.10-NAT 10.10.10.10-NAT
1test-10.10.10.10 1test-10.10.10.10
$ sed -e 's/ [^0-9]*/ /' \
> -e 's/ [^.]*-\([0-9][0-9.]*[0-9]\)/ \1/' \
> -e 's/ \([0-9][0-9.]*[0-9]\)[^0-9.].*$/ \1/' data
host_192.168.0.100 192.168.0.100
ab-ab-abababab-ABABABAB-000.000.000.0 000.000.000.0
01-admin-10.10.10.10 10.10.10.10
10.10.10.10-NAT 10.10.10.10
1test-10.10.10.10 10.10.10.10
$
The sed
script acquires 3 independent cleaning expressions. The first, as before, removes any non-digits immediately after a space. It is unlikely to need tweaking.
The 01-admin-
line though is untouched by that; the second regular expression deals with that by looking for a blank, a sequence of non-dots followed by a dash, and then capturing a sequence starting with a digit, continuing with interleaved digits and dots, and ending with a digit, replacing it with the remembered string of digits and dots. Matching the dash is key to that working sanely; if you're not careful, the *
is too greedy (so, for example, s/ .*\([0-9][0-9.]*[0-9]\)/\1/
gobbled the leading digits off the IP-address component). I'm assuming that sed
doesn't have non-greedy quantifiers such as *?
; you might come up with a different answer if your version does (but this version will work as well). You may need to tweak that pattern to handle other exceptional cases; please do that for yourself, not as an edit to this question.
The third regular expression deals with the trailing -NET
and other such material; it looks for and remembers the sequence of digits and dots (starting and ending with a digit), followed by a non-digit, non-dot character and any other trailing material, replacing it with the remembered string of digits and dots. This is unlikely to need much tweaking.
Upvotes: 1
Reputation: 200263
Try this:
sed 's/^\([^ ]*\) [a-z_-]*\(.*\)/\1 \2/i'
Edit: Updated to reflect the changed requirements.
Upvotes: 1
Reputation: 65791
A simplistic way that could work:
sed 's/ [A-Za-z_]*/ /'
Example:
$ sed 's/ [A-Za-z_]*/ /' <<<'host_192.168.0.100 host_192.168.0.100'
host_192.168.0.100 192.168.0.100
Upvotes: 1