Uvais Ibrahim
Uvais Ibrahim

Reputation: 741

Regex to get match on entire string

How to match a a word before a specific charachter using sed in bash?

In my scenario I would need to match the metrics names in the entire string which occurs only before {.

The below is the string I am working on.

sum(rate(nginx_ingress_controller_request_duration_seconds_sum{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sum(rate(nginx_ingress_controller_request_duration_seconds_count{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))

What I would need the output is the below.

nginx_ingress_controller_request_duration_seconds_sum

nginx_ingress_controller_request_duration_seconds_count

I am not a Regex expert and I would be very thankful.

Upvotes: 0

Views: 259

Answers (3)

Sundeep
Sundeep

Reputation: 23677

With GNU grep:

grep -oP '\(\K[^({]+(?={)'

This will print the results in separate lines. \(\K will check for presence of ( character and reset the start of matching portion (since ( isn't needed in the output). [^({]+ will match except ( and { characters. (?={) makes sure that the matched portion is followed by { character (but not part of the output).

If you know that the required portion can have only word characters, you can also use:

grep -oP '\w+(?={)'

Upvotes: 2

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed -E '/^(\w+)\{/{s//\1\n/;P;D};s/^\w*\W/\n/;D' file

If the start of the line is a valid string followed by a {, replace the { by a newline, print/delete the first line in the pattern space and repeat.

Otherwise, reduce the pattern space and repeat until all strings are matched.

N.B. A valid string in this case is a word i.e. alphanumeric or an underscore.

Upvotes: 0

mattb
mattb

Reputation: 3063

This will look for two occurrences on the line onto a separate line in new_file (with GNU sed):

sed 's/.*(\(.*\){.*(\(.*\){.*/\1\n\2/' your_file > new_file

Contents of new_file:

nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count

The ways it's working is as follows:

  • /.*(: Match everything after a { up to a (
  • \(.*\): I remember the stuff in between \( and \) (these are called capture group)
  • {.*(: Match everything after a { up to a (
  • \(.*\): I remember a second group of stuff using a second capture group
  • {.*: Match the rest of the stuff in the line
  • /\1\n\2/: Put the two patterns we remembered back into a file a newline \n between.

Edit

Another approach that would would work for multiple occurrences would be to create newlines and a unique patter at the points before and after the part of the string that you're interested in, and then grep away those lines:

sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file | grep -v BADLINES

The first part (sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file) produces:

sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_sum
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_count
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))

and the | grep -v BADLINES produces:

nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count

Upvotes: 0

Related Questions