Show just specific group of regexp and remove rest of the line in bash with sed

Question

I have an access log with many lines in the following format:

1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

I just want to get the response time, so in this example 2/2125012. My idea was to write a regex pattern, that matches the brackets content in a group, and everything after/before it in other groups. So I could replace the entire line by just this value:

^(.*)RESPONSE_TIME: $$([^$$]+)(.*)$

Using 101regex with an example input string, it gavae me `` as second group as expected:

Group 2 2/2125012

To use this pattern with egrep, I escaped the brackets like this:

$ sed 's#^$.*$RESPONSE_TIME: $$\([\^$$]+\)$.*$$#\2#g' testfile
1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

Why is nothing replaced? I escaped ( and [.

It seems that this has something to do with the square brackets:

$ sed 's#^$.*$RESPONSE_TIME: $$\(.*\)$$ (micro$.*$$#\2#g' testfile
2/2125012

This worked. But those pattern is not very specific. I'd like make it more specific by having e.g. [0-9]+/[0-9]+ for the pattern inside the brackets instead of (.*) wildcard pattern.

Wiktor Stribiżew · Accepted Answer

Your pattern contains an issue related to the use of POSIX BRE/ERE: [\^\]]+ matches a char that is either ^ or ] and then a + char (demo). You need to use * (that matches 0 or more occurrences) instead of +, or \+ in GNU sed, or \{1,\} in a generic POSIX BRE.

You may fix the sed command by using

sed -n 's#.*RESPONSE_TIME: \[$[^]]*$.*#\1#p' testfile

See the online sed demo.

Details

-n -suppresses the default line output
.*RESPONSE_TIME: \[$[^]]*$.* - matches any 0+ chars, RESPONSE_TIME:, space, [, then captures into Group 1 any zero or more chars other than ], and then matches the rest of the string
\1 - replaces the match with the Group 1 value
p - prints the result of the substitution.

Show just specific group of regexp and remove rest of the line in bash with sed

Answers (2)

Related Questions