Scuba_Steve
Scuba_Steve

Reputation: 91

POSIX ERE regex using tail plugin within collectd

I am trying to match specific values in a file using the "tail" plugin for collectd. This plugin only supports POSIX ERE syntax. Sample file below:

capture.kernel_packets                     | Total                     | 25496291490
capture.kernel_drops                       | Total                     | 873229305

Attempt #1:

capture\.kernel_packets.*Total.*\|\s+(\d+)

I want to extract the value "25496291490" in the first capture group.

Attempt #2:

capture\.kernel_packets.*Total.*\|\s+(\d+)\1

It seems to only grab the full match. The following works but is not supported by POSIX ERE:

capture\.kernel_packets.*Total.*\|\s+\K\S+

https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_tail http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html

What am I overlooking? Thanks!

Upvotes: 0

Views: 699

Answers (1)

Rachel
Rachel

Reputation: 2894

I think that your 1st attempt is close.

What I suspect you may be overlooking is the need to escape the string twice for use within collectd/tail. Let me explain.

First up, the collectd code is compiling the regex string you provide with the flags

REG_EXTENDED | REG_NEWLINE

But also, the string you provide in the tail.conf file in the Regex field is not the actual regex. It is a string suitable for use in the C language, so you have to be aware of the 2 separate levels of escaping.

1) the escaping required by the Extended Regex syntax e.g. if you want to use one of these

.[{}()\*+?|^$ 

then you need escape it with a \

So for example if you want to use the actual character '*', then the regex requires that you have '\*' so the compiler knows you means "asterisk" not "regex zero-or-more repeat".

2) But also, you need the escaping required by the C language.

So to produce the actual character '|' in the regex string, you need to escape it like this '\|'. And to provide that regex string in the tail.conf file as a C string, you need to escape it again '\\|'.

So you need this regex string :

capture\.kernel_packets.*Total.*\|\s+([0-9]+)

Which you would provide in your tail.conf with extra C escaping as :

capture\\.kernel_packets.*Total.*\\|\\s+([0-9]+)

The whole string is matched, and the number you want ends up in group 1, which gives collectd the number it needs for parsing.

Upvotes: 1

Related Questions