nikost
nikost

Reputation: 858

How to get the first integer from all lines that match a pattern?

I have a file and I only want to find lines that have "here". In each of these lines there are multiple string and integer values (see example below). I only want the first integer of each line that matches the pattern.

I have created a solution that uses a bash script, but is there a simpler method I am missing. I was hoping something like grep -w here -Eo [0-9] file would work. However when I try that it expects anything that comes after "here" to be the file.

STEP 1 STAGE 1 here other info
foo
bar
STEP 2 STAGE 1 here other info
more
foo
bar
STEP 3 STAGE 1 here other info

For this file the desired output would be

1
2
3

Upvotes: 1

Views: 113

Answers (5)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2837

gawk 'sub(__,_, $!(NF=NF)) * NF' FS='^[^0-9]*' __='([^0-9].*)?$' OFS= 
  • or
mawk -F'^[^0-9]*' '$!NF = int($(_*(NF=NF)))'
1
2
3

Upvotes: 0

anubhava
anubhava

Reputation: 785286

This simpler awk should work for you:

awk '/ here / {sub(/^[^0-9]+/, ""); print $1+0}' file

1
2
3

Upvotes: 2

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10133

grep is not the right command for this. I'd use sed:

sed -n '/ here /s/[^0-9]*\([0-9]*\).*/\1/p' file

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163362

Another variant with gnu-grep using -P for Perl-compatible regular expressions if supported:

grep -oP "^\D*\K\d+(?=.*\bhere\b)" file

The pattern matches:

  • ^ Start of string
  • \D* Match optional non digits
  • \K Forget what is matched do far
  • \d+ Match 1+ digits
  • (?=.*\bhere\b) Positive lookahead, assert here to the right

Output

1
2
3

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133538

With GNU awk you could try following awk code. Written and tested with your shown samples.

awk '
match($0,/(^|[[:space:]]+)([0-9]+)[[:space:]]+.*here /,arr){
  print arr[2]
}
' Input_file

Explanation: In GNU awk first searching string here keyword AND then using match function of GNU awk where using (^|[[:space:]]+)([0-9]+)[[:space:]]+.*here regex which creates 2 capturing Groups and stores their values into an array named arr with index of 1,2 respectively. If all these conditions are verified then printing the 2nd element of that array which is required value(integer of line).

Upvotes: 2

Related Questions