user3285014
user3285014

Reputation: 319

Grep only second part after space

I have a parser in a shell script:

Here is the input file to parse from (input.txt):

input.txt:
system.switch_cpus.commit.swp_count                 0                       # Number of s/w prefetches committed
  system.switch_cpus.commit.refs                2682887                       # Number of memory references committed
  system.switch_cpus.commit.loads               1779328                       # Number of loads committed                                                                                                                                                                                                                
  system.switch_cpus.commit.membars                   0                       # Number of memory barriers committed
  system.switch_cpus.commit.branches             921830                       # Number of branches committed
  system.switch_cpus.commit.vec_insts                 0                       # Number of committed Vector instructions.
  system.switch_cpus.commit.fp_insts                  0                       # Number of committed floating point instructions.
  system.switch_cpus.commit.int_insts          10000000                       # Number of committed integer instructions.

The script does the following:

 $ cpu1_name="system.switch_cpus"
 $ echo "$(grep "${cpu1_name}.commit.loads" ./input.txt |grep -Eo '[0-9]+')"
 correct expected output: 1779328

But on another file the variable "cpu1_name" is changed to "system.switch_cpus_1" Running the same script now gives me 2 values:

New input file:
system.switch_cpus_1.commit.swp_count               0                       # Number of s/w prefetches committed
 system.switch_cpus_1.commit.refs              2682887                       # Number of memory references committed
 system.switch_cpus_1.commit.loads             1779328                       # Number of loads committed                                                                                                                                                                                                               
 system.switch_cpus_1.commit.membars                 0                       # Number of memory barriers committed
 system.switch_cpus_1.commit.branches           921830                       # Number of branches committed
 system.switch_cpus_1.commit.vec_insts               0                       # Number of committed Vector instructions.
 system.switch_cpus_1.commit.fp_insts                0                       # Number of committed floating point instructions.   


Modified Script line:
$ cpu1_name="system.switch_cpus_1"
$ echo "$(grep "${cpu1_name}.commit.loads" ./new_input.txt |grep -Eo '[0-9]+')"
1
1779328

As you can see, the piped grep is searching for any number and reporting an extra "1" due to the changed variable name.

Is there a way to only select the second part of the number (i.e. only 1779328)? I know I can use awk'{print $2} but that would mean changing a lot of lines in the script. So I was thinking if there was a easier trick with the existing script lines.

Thanks in advance

Upvotes: 4

Views: 3691

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Since _ is considered a word char, there is no word boundary between the _ and 1. There are word boundaries on both sides of the expected numbers.

Thus, all you need to do is to use your pattern with word boundaries. You may use w option to match as a whole word, or choose between \b or \< / \>, whichever your grep supports:

grep -Ewo '[0-9]+'
grep -Eo '\b[0-9]+\b'
grep -Eo '\<[0-9]+\>'

See the online demo.

Note you may also use sed to extract the second non-whitespace chunk from the lines:

sed -E 's/^\s*\S+\s+(\S+).*/\1/'

See this demo.

Details

  • ^ - start of line
  • \s* - 0+ whitespaces
  • \S+ - 1+ chars other than whitespace
  • \s+ - 1+ whitespace chars
  • (\S+) - 1+ non-whitespace chars (Group 1, just what we keep with \1 placeholder in the replacement pattern)
  • .* - the rest of the line.

Upvotes: 2

Martin Heraleck&#253;
Martin Heraleck&#253;

Reputation: 5779

The values (numbers you are trying to get) are obviously surrounded by spaces. So you can use possitive-lookbehind (?<=pattern) and possitive-lookahead (?=pattern) regex conditionals to find those matches that have a space around them.

Note that to use these conditionals you need to use -P flag in grep.

Upvotes: 1

John1024
John1024

Reputation: 113814

Awk can do it all in one step (no pipeline needed):

awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' input.txt

This should be portable and work with any POSIX awk.

Example

$ awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' input.txt
1779328
$ awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' new_input.txt
1779328

How it works

  • -v x="${cpu1_name}.commit.loads"

    This defines an awk variable x that contains the name that we are looking for.

  • $1==x{print $2}

    If the first field, $1, is equal to x, then print the second field, $2.

Upvotes: 1

Allan
Allan

Reputation: 12438

You can just change your grep command in :

grep -oP '(?<=\s)[0-9]+'

To impose the presence of a space before your digit chains, even better try:

grep -oP '(?<=\s)\d+'

or eventually in grep -oP '(?<=\s)\d+(?=\s)' or in grep -oP '(?<=\s)[0-9]+(?=\s)'

Upvotes: 0

Related Questions