user972276
user972276

Reputation: 3053

bash string manipulation using sed/regex

Through bash, I am trying to extract part of a line from a file.

Currently I am using two sed commands back to back like so:

sed -n -e "s/^abc=//p" file | sed -n -e "s/\.//gp"

which can take in abc=1.2.3 and spit out 123. This got me thinking... Can I achieve this with just one command call? As in I want to find all strings in a file that match abc=<digit1>\.<digit2>\.<digit3> and spit out <digit1><digit2><digit3>.

EDIT:

Just to clarify, I want this to only print out lines that match. For instance, if I have the following file:

1.2.3.4
abc=quack
qtip=1.2.3
abc=1.2.3
abc = 4.5.6

running the command should only print 123

Upvotes: 2

Views: 2319

Answers (5)

John1024
John1024

Reputation: 113844

Here is an approach using awk that will work with any number of digits, not just three, separated by periods:

$ echo 'abc=1.2.3.4' | awk -F. -v OFS= '{sub(/.*=/, "", $1); print}'
1234
$ echo 'abc=1.2' | awk -F. -v OFS= '{sub(/.*=/, "", $1); print}'
12

Taking the awk command in parts:

  • -F.

    Use a period as the field separator. If the input is abc=1.2, for example, then awk sees two fields: abc=1 and 2.

  • -v OFS=

    This tells awk not to put any spaces between the fields when we print them out.

  • sub(/.*=/, "", $1)

    This removes the abc= part from the beginning of the line.

  • print

    This prints out the final line.

Selecting which lines to process

Suppose that we only want to process lines starting with abc= and followed only by numbers and periods. In that case:

$ awk -F. -v OFS= '/^abc=[0-9.]+$/ {sub(/.*=/, "", $1); print}' sample
123

where sample is the name of the file containing the sample lines in the updated question.

The sole change above is the addition of the pattern /^abc=[0-9.]+$/. This limits the commands which follow to only apply to lines matching this regular expression. Since /^abc=[0-9.]+$/ only matches lines that start with abc= followed by any combination of numbers of periods, only those lines are processed. Non-matching lines are ignored.

Upvotes: 0

damienfrancois
damienfrancois

Reputation: 59120

You can also simply use tr:

$ tr -d [a-z.=] <<< abc=1.2.3
123

EDIT: I missed the part of the question where 'I want to find all strings in a file that match...' So this might or might not work depending on the content of the other, unwanted, lines.

Upvotes: 0

l&#39;L&#39;l
l&#39;L&#39;l

Reputation: 47169

This should work:

sed -E 's/abc=|([0-9])\./\1/g' file

Upvotes: 0

anubhava
anubhava

Reputation: 785156

You can use awk instead for removing . from part after abc=:

awk -F= '$1=="abc"{gsub(/\./, "", $2); print $2}' file

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174706

You could try the below GNU sed command is the string abc=<digit1>\.<digit2>\.<digit3> present anywhere in a file,

sed -nr 's/.*abc=([0-9])\.([0-9])\.([0-9]).*/\1\2\3/p' file

OR

You could try the below sed command if the string abc= is at the start of a line.

sed -nr 's/^abc=([0-9])\.([0-9])\.([0-9]).*/\1\2\3/p' file

Example:

$ cat file
abc=1.2.3
foo abc=4.5.6
bar
$ sed -nr 's/.*abc=([0-9])\.([0-9])\.([0-9]).*/\1\2\3/p' file
123
456
$ sed -nr 's/^abc=([0-9])\.([0-9])\.([0-9]).*/\1\2\3/p' file
123

Upvotes: 1

Related Questions