Reputation: 91
I am trying to match specific values in a file using the "tail" plugin for collectd. This plugin only supports POSIX ERE syntax. Sample file below:
capture.kernel_packets | Total | 25496291490
capture.kernel_drops | Total | 873229305
Attempt #1:
capture\.kernel_packets.*Total.*\|\s+(\d+)
I want to extract the value "25496291490" in the first capture group.
Attempt #2:
capture\.kernel_packets.*Total.*\|\s+(\d+)\1
It seems to only grab the full match. The following works but is not supported by POSIX ERE:
capture\.kernel_packets.*Total.*\|\s+\K\S+
https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_tail http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html
What am I overlooking? Thanks!
Upvotes: 0
Views: 699
Reputation: 2894
I think that your 1st attempt is close.
What I suspect you may be overlooking is the need to escape the string twice for use within collectd/tail. Let me explain.
First up, the collectd code is compiling the regex string you provide with the flags
REG_EXTENDED | REG_NEWLINE
But also, the string you provide in the tail.conf file in the Regex field is not the actual regex. It is a string suitable for use in the C language, so you have to be aware of the 2 separate levels of escaping.
1) the escaping required by the Extended Regex syntax e.g. if you want to use one of these
.[{}()\*+?|^$
then you need escape it with a \
So for example if you want to use the actual character '*
', then the regex requires that you have '\*
' so the compiler knows you means "asterisk" not "regex zero-or-more repeat".
2) But also, you need the escaping required by the C language.
So to produce the actual character '|
' in the regex string, you need to escape it like this '\|
'. And to provide that regex string in the tail.conf file as a C string, you need to escape it again '\\|
'.
So you need this regex string :
capture\.kernel_packets.*Total.*\|\s+([0-9]+)
Which you would provide in your tail.conf with extra C escaping as :
capture\\.kernel_packets.*Total.*\\|\\s+([0-9]+)
The whole string is matched, and the number you want ends up in group 1, which gives collectd the number it needs for parsing.
Upvotes: 1