Reputation: 319
I have a parser in a shell script:
Here is the input file to parse from (input.txt):
input.txt:
system.switch_cpus.commit.swp_count 0 # Number of s/w prefetches committed
system.switch_cpus.commit.refs 2682887 # Number of memory references committed
system.switch_cpus.commit.loads 1779328 # Number of loads committed
system.switch_cpus.commit.membars 0 # Number of memory barriers committed
system.switch_cpus.commit.branches 921830 # Number of branches committed
system.switch_cpus.commit.vec_insts 0 # Number of committed Vector instructions.
system.switch_cpus.commit.fp_insts 0 # Number of committed floating point instructions.
system.switch_cpus.commit.int_insts 10000000 # Number of committed integer instructions.
The script does the following:
$ cpu1_name="system.switch_cpus"
$ echo "$(grep "${cpu1_name}.commit.loads" ./input.txt |grep -Eo '[0-9]+')"
correct expected output: 1779328
But on another file the variable "cpu1_name" is changed to "system.switch_cpus_1" Running the same script now gives me 2 values:
New input file:
system.switch_cpus_1.commit.swp_count 0 # Number of s/w prefetches committed
system.switch_cpus_1.commit.refs 2682887 # Number of memory references committed
system.switch_cpus_1.commit.loads 1779328 # Number of loads committed
system.switch_cpus_1.commit.membars 0 # Number of memory barriers committed
system.switch_cpus_1.commit.branches 921830 # Number of branches committed
system.switch_cpus_1.commit.vec_insts 0 # Number of committed Vector instructions.
system.switch_cpus_1.commit.fp_insts 0 # Number of committed floating point instructions.
Modified Script line:
$ cpu1_name="system.switch_cpus_1"
$ echo "$(grep "${cpu1_name}.commit.loads" ./new_input.txt |grep -Eo '[0-9]+')"
1
1779328
As you can see, the piped grep is searching for any number and reporting an extra "1" due to the changed variable name.
Is there a way to only select the second part of the number (i.e. only 1779328)?
I know I can use awk'{print $2}
but that would mean changing a lot of lines in the script. So I was thinking if there was a easier trick with the existing script lines.
Thanks in advance
Upvotes: 4
Views: 3691
Reputation: 626689
Since _
is considered a word char, there is no word boundary between the _
and 1
. There are word boundaries on both sides of the expected numbers.
Thus, all you need to do is to use your pattern with word boundaries. You may use w
option to match as a whole word, or choose between \b
or \<
/ \>
, whichever your grep
supports:
grep -Ewo '[0-9]+'
grep -Eo '\b[0-9]+\b'
grep -Eo '\<[0-9]+\>'
See the online demo.
Note you may also use sed
to extract the second non-whitespace chunk from the lines:
sed -E 's/^\s*\S+\s+(\S+).*/\1/'
See this demo.
Details
^
- start of line\s*
- 0+ whitespaces\S+
- 1+ chars other than whitespace\s+
- 1+ whitespace chars(\S+)
- 1+ non-whitespace chars (Group 1, just what we keep with \1
placeholder in the replacement pattern).*
- the rest of the line.Upvotes: 2
Reputation: 5779
The values (numbers you are trying to get) are obviously surrounded by spaces. So you can use possitive-lookbehind (?<=pattern)
and possitive-lookahead (?=pattern)
regex conditionals to find those matches that have a space around them.
Note that to use these conditionals you need to use -P
flag in grep.
Upvotes: 1
Reputation: 113814
Awk can do it all in one step (no pipeline needed):
awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' input.txt
This should be portable and work with any POSIX awk.
$ awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' input.txt
1779328
$ awk -v x="${cpu1_name}.commit.loads" '$1==x{print $2}' new_input.txt
1779328
-v x="${cpu1_name}.commit.loads"
This defines an awk variable x
that contains the name that we are looking for.
$1==x{print $2}
If the first field, $1
, is equal to x
, then print the second field, $2
.
Upvotes: 1
Reputation: 12438
You can just change your grep command in :
grep -oP '(?<=\s)[0-9]+'
To impose the presence of a space before your digit chains, even better try:
grep -oP '(?<=\s)\d+'
or eventually in grep -oP '(?<=\s)\d+(?=\s)'
or in grep -oP '(?<=\s)[0-9]+(?=\s)'
Upvotes: 0