Lion
Lion

Reputation: 17879

Show just specific group of regexp and remove rest of the line in bash with sed

I have an access log with many lines in the following format:

1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

I just want to get the response time, so in this example 2/2125012. My idea was to write a regex pattern, that matches the brackets content in a group, and everything after/before it in other groups. So I could replace the entire line by just this value:

^(.*)RESPONSE_TIME: \[([^\]]+)(.*)$

Using 101regex with an example input string, it gavae me `` as second group as expected:

Group 2 2/2125012

To use this pattern with egrep, I escaped the brackets like this:

$ sed 's#^\(.*\)RESPONSE_TIME: \[\([\^\]]+\)\(.*\)$#\2#g' testfile
1.2.3.4:443  - - [11/Mar/2020:09:41:05 +0100] RESPONSE_CODE:[200] AGE: [-] CACHE_MISS: [-] CACHE-STATUS: [-] SIZE: [1288] RESPONSE_TIME: [2/2125012] (microseconds) WAS:[was.internal:9444] "PUT /kudosboards/node/a8740540-801a-43a6-822a-d58a2424fd3f HTTP/1.1" 200 REFERER: "https://ihs.internal/kudosboards/"

Why is nothing replaced? I escaped ( and [.

It seems that this has something to do with the square brackets:

$ sed 's#^\(.*\)RESPONSE_TIME: \[\(.*\)\] (micro\(.*\)$#\2#g' testfile
2/2125012

This worked. But those pattern is not very specific. I'd like make it more specific by having e.g. [0-9]+/[0-9]+ for the pattern inside the brackets instead of (.*) wildcard pattern.

Upvotes: 1

Views: 64

Answers (2)

Ed Morton
Ed Morton

Reputation: 203209

$ awk -F'[][]' '{print $14}' file
2/2125012

If that's not all you need then edit your question to provide more truly representative sample input/output including cases that the above doesn't work for.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Your pattern contains an issue related to the use of POSIX BRE/ERE: [\^\]]+ matches a char that is either ^ or ] and then a + char (demo). You need to use * (that matches 0 or more occurrences) instead of +, or \+ in GNU sed, or \{1,\} in a generic POSIX BRE.

You may fix the sed command by using

sed -n 's#.*RESPONSE_TIME: \[\([^]]*\).*#\1#p' testfile

See the online sed demo.

Details

  • -n -suppresses the default line output
  • .*RESPONSE_TIME: \[\([^]]*\).* - matches any 0+ chars, RESPONSE_TIME:, space, [, then captures into Group 1 any zero or more chars other than ], and then matches the rest of the string
  • \1 - replaces the match with the Group 1 value
  • p - prints the result of the substitution.

Upvotes: 1

Related Questions